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COMPUTATIONAL METHOD FOR IDENTIFYING ADHESIN AND 
ADHESIN-LIKE PROTEINS OF THERAPEUTIC POTENTIAL 

Field of the present Invention 

5 A computational method for identifying adhesin and adhesin-like proteins; computer 
system for performing the method; and genes and proteins encoding adhesin and 
adhesin-like proteins 

&s*rk*jr.yizjf A {- - -">r ZXt C 7 £be present Tr*"CTtiJQn 

The progress in genome sequencing projects has generated a large number of inferred 

10 protein sequences from different organisms. It is expected that the availability of the 
information on the complete set of proteins from infectious human pathogens will 
enable us to develop novel molecular approaches to combat them. A necessary step in 
the successful colonization and subsequent manifestation of disease by microbial 
pathogens is the. ability to adhere to host cells. 

15 Microbial pathogens encode several proteins known as adhesins that mediate their 
adherence to host cell surface receptors, membranes, or extracellular matrix for 
successful colonization. Investigations in this primary event of host-pathogen 
interaction over the past decades have revealed a wide array of adhesins in a variety of 
pathogenic microbes. Presently, substantial information on the biogenesis of adhesins 

20 and the regulation of adhesin factors is available. One of the best understood 
mechanisms of bacterial adherence is attachment mediated by pili or fimbriae. Several 
afimbrial adhesins also have been reported. In addition, limited knowledge on the target 
host receptors also has been gained (Finlay, B.B. and Falkow, S 1997). 
New approaches to vaccine development focus on targeting adhesins to abrogate the 

25 colonization process (Wizemann, et al 1999). However, the specific role of particular 
adhesins has been difficult to elucidate. Thus, prediction of adhesins or adhesin-like 
proteins and their functional characterization is likely to aid not only in deciphering the 
molecular mechanisms of host pathogen interaction but also in developing new vaccine 
formulations, which can be tested in suitable experimental model systems. 

30 One of the best understood mechanisms of bacterial adherence is attachment mediated 
by pili or fimbriae. For example, FimH and PapG adhesins of Escherichia coli (Maurer, 
L., Orndorff, P.(1987), Bock, K., et a/.(1985). Other examples of pili group adhesins 
include type IV pili in Pseudomonas aeruginosa, Neisseria species, Moraxella species, 
Enteropathogenic Escherichia coli and Vibrio cholerae (Sperandio V et al (1996). 
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Several afimbrial adhesins are HMW proteins of Haemophilus influenzae (van 
Schilfgaarde 2000), the filamentous hemagglutinin, pertactin, of Bordetella pertussis 
(Bassinet et al 2000), the BabA of H. pylori (Yu J et al 2002) and the YadA adhesin of 
Yersinia enterocolitica (Neubauer et al 2000). The intimin receptor protein (Tir) of 

5 Enteropathogenic E. coli (EPEC) is another type of adhesin (Ide T et al 2003). Other 
class of adhesins includes MrkD protein of Kleibsella pneumoniae, Hia of H. influenzae 
(St Geme et al 2000), Ag I/II of Streptococcus mutans and SspA, SspB of 
Streptococcus gordonii (Egland et al 2001), FnbA, FnbB of Staphylococcus aureus and 
Sfbl, protein F of Streptococcus pyogenes , the PsaA of Streptococcus pneumoniae (De 

10 et al 2003). 

A known example of adhesins approved as vaccine is the acellular pertussis vaccine 
containing FHA and pertactin against B. pertussis the causative agent of whooping 
cough (Halperin, S et al 2003). Immunization with FimH is being evaluated for 
protective immunity against pathogenic E. coli (Langermann S et al 2000), in 

15 Streptococcus pneumoniae, PsaA is being investigated as a potential vaccine candidate 
against pneumococcal disease (Rapola, S et al 2003). Immunization results with BabA 
adhesin showed promise for developing a vaccine against H. pylori (Prinz, C et al 
2003). A synthetic peptide sequence anti-adhesin vaccine is being evaluated for 
protection against Pseudomonas aeruginosa infections. 

20 Screening for adhesin and adhesin like proteins by conventional experimental method 
is laborious, time consuming and expensive. As an alternative, homology search is used 
to facilitate the identification of adhesins. Although, this procedure is useful in the 
analysis ojf genome organization (Wolf et al 2001) and of metabolic pathways 
(Peregrin- Alvarez et al 2003, Rison et al 2002), it is somewhat limited in allowing 

25 functional predictions when the homologues are not functionally characterized or the 
sequence divergence is high. Assignment of functional roles to proteins based on this 
technique has been possible for only about 60% of the predicted protein sequences 
(Fraser et al 2000). Thus, we explored the possibility of developing a non-homology 
method based on sequence composition properties combined with the power of the 

30 Artificial Neural Networks to identify adhesins and adhesin-like proteins in species 
belonging to wide phylogenetic spectrum. 

Twenty years ago, Nishikawa et al carried out some of the early attempts to classify 
proteins into different groups based on compositional analysis (Nishikawa et al 1983). 
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More recently, the software PropSearch was developed for analyzing protein sequences 
where conventional alignment tools fail to identify significantly similar sequences 
(Hobohm, U. and Sander, C 1995). PropSearch uses 144 compositional properties of 
protein sequences to detect possible structural or functional relationships between a 
5 new sequence and sequences in the database. Recently the compositional attributes of 
proteins have been used to develop softwares for predicting secretory proteins in 
bacteria and apicoplast targeted proteins in Plasmodium falciparum by training 
Artificial Neural Networks (Zuegge et al 2001). 

Zuegge et al have used the 20 amino acid compositional properties. Their objective 
10 was to extract features of apicoplast targeted proteins in Plasmodium falciparum. This 
is distinct from our software SPAAN that focuses on adhesins and adhesin-like proteins 
involved in host-pathogen interaction. 

Hobohm and Sander have used 144 compositional properties including isoelectric point 
and amino acid and dipeptide composition to generate hypotheses on putative 

15 functional role of proteins that are refractory to analysis using other sequence alignment 
based approaches like BLAST and FASTA. Hobohm and Sander do not specifically 
address the issue of adhesins and adhesin-like proteins, which is the focus of SPAAN 
Nishikawa et al had originally attempted to classify proteins into various functional 
groups. This was a curiosity driven exercise but eventually lead to the development of a 

20 software to discriminate extra-cellular proteins from intracellular proteins. This work 
did not address the issue of adhesins and adhesin-like proteins, which is the focus of 
SPAAN. 

Thus, none of the aforementioned research groups have been able to envisage the 
methodology of the instant application. The inventive method of this application 

25 provides novel proteins and corresponding gene sequences. 

Adhesins and adhesin-like proteins mediate host-pathogen interactions. This is the first 
step in colonization of a host by microbial pathogens. Attempts Worldwide are focused 
on designing vaccine formulations comprising adhesin proteins derived from 
pathogens. When immunized, host will have its immune system primed against 

30 adhesins for that pathogen. When a pathogen is actually encountered, the surveillance 
mechanism will recognize these adhesins, bind them through antigen-antibody 
interactions and neutralize the pathogen through complement mediate cascade and 
other related clearance mechanisms. This strategy has been successfully employed in 
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the case of Whooping cough and is being actively pursued in the case of Pneurrr>r.ia, 
Gastric Ulcer and Urinary tract infections. 
Objects of the present Invention 

The main object of the present invention is to provide a computational method for 
5 identifying adhesin and adhesin-like proteins of therapeutic potential. 

Another object of invention is to provide a method for screening the proteins with 
unique compositional characteristics as putative adhesins in different pathogens. 
Yet, another object of the invention is providing the use of gene sequences encoding 
the putative adhesin proteins useful as preventive therapeutics. 

10 Summary of the present Invention 

A computational method for identifying adhesin and adhesin-like proteins, said method 
comprising steps of computing the sequence-based attributes of protein sequences using 
five attribute modules of software SPAAN, (i) amino acid frequencies, (ii) multiplet 
frequency, (iii) dipeptide frequencies, (iv) charge composition, and (v) hydrophobic 

15 composition, training the artificial neural Network (ANN) for each of the computed 
five attributes, and identifying the adhesin and adhesin-like proteins having probability 
of being an adhesin (E a d) as > 0.51; a computer system for performing the method; and 
genes and proteins encoding adhesin and adhesin-like proteins 
Detailed description of the present Invention 

20 Accordingly, the present invention relates to a computational method for identifying 
adhesin and adhesin-like proteins, said method comprising steps of computing the 
sequence-based attributes of protein sequences using five attribute modules of software 
SPAAN, (i) amino acid frequencies, (ii) multiplet frequency, (iii) dipeptide frequencies, 
(iv) charge composition, and (v) hydrophobic composition, training the artificial neural 

25 Network (ANN) for each of the computed five attributes, and identifying the adhesin 
and adhesin-like proteins having probability of being an adhesin (P a d) as > 0.51; a 
computer system for performing the method; and genes and proteins encoding adhesin 
and adhesin-like proteins 

In an embodiment of the present invention, wherein the invention relates to a 
30 computational method for identifying adhesin and adhesin-like proteins, said method 
comprising steps of: 

a. computing the sequence-based attributes of protein sequences using five 
attribute modules of a neural network software, wherein the attributes 
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are software, (i) amino acid frequencies, (ii) multiplet frequency, (m) 
dipeptide frequencies, (iv) charge composition, and (v) hydrophobic 
composition, 



5 



b. 



training the artificial neural Network (ANN) for each of the computed 
five attributes, and 



c. 



identifying the adhesin and adhesin-like proteins having probability of 
being an adhesin (P^d) as > 0.51, 



In another embodiment of the present invention, wherein the invention relates to a 
method wherein the protein sequences is obtained from pathogens, eukaryotes, and 

10 multicellular organisms. 

In an embodiment of the present invention, wherein the invention relates to a method, 
wherein the protein sequences are obtained from the pathogens selected from a group 
of organisms comprising Escherichia coli 9 Haemophilus influenzae, Helicobacter 
pylori, Mycoplasma pneumoniae] Mycobacterium tuberculosis, Rickettsiae prowazekii, 

15 Porphyromonas gingivalis, Shigella flexneri, Streptococcus mutans, Streptococcus 
pneumoniae, Neisseria meningitides, Streptococcus pyogenes, Treponema pallidum and 
Severe Acute Respiratory Syndrome associated human coronavirus (SARS ). 
In yet another embodiment of the present invention, wherein the method of the 
invention is a non-homology method. 

20 In still another embodiment of the present invention, wherein the invention relates to 
the method using 105 compositional properties of the sequences. 

In still another embodiment of the present invention, wherein the invention relates to a 
method showing sensitivity of at least 90%. 

In still another embodiment of the present invention, wherein the invention relates to 
25 the method showing specificity of 100%. 

In still another embodiment of the present invention, wherein the invention relates to a 
method identifying adhesins from distantly related organisms. 

In still another embodiment of the present invention, wherein the invention relates to 
the neural network has multi-layer feed forward topology, consisting of an input layer, 
30 one hidden layer, and an output layer. 

In still another embodiment of the present invention, wherein the invention relates to 
the number of neurons in the input layer are equal to the number of input data points for 
each attribute. 
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In still another embodiment of the present invention, wherein the invention relates to 
the "Pad" is a weighted linear sum of the probabilities from five computed attributes. 
In still another embodiment of the present invention, wherein the invention relates to 
each trained network assigns a probability value of being an adhesin for the protein 
5 sequence. 

In still another embodiment of the present invention, wherein the invention relates to a 
computer system for performing the method of claim 1, said system comprising a 
central processing unit, executing SPAAN program, giving probabilities based on 
different attributes using Artificial Neural Network and in built other programs of 
10 assessing attributes, all stored in a memory device accessed by CPU, a display on 
which the central processing unit displays the screens of the above mentioned programs 
in response to user inputs; and a user interface device. 

In still another embodiment of the present invention, wherein the invention relates to a 
set of 274 annotated genes encoding adhesin and adhesin-like proteins, having SEQ ID 
15 Nos. 385 to 658. 

In still another embodiment of the present invention, wherein the invention relates to a 
set of 105 hypothetical genes encoding adhesin and adhesin-like proteins, having SEQ 
ID Nos. 659 to 763. 

In still another embodiment of the present invention, wherein the invention relates to a 
20 set of 279 annotated adhesin and adhesin-like proteins of SEQ ID Nos. 1 to 279. 

In still another embodiment of the present invention, wherein the invention relates to a 
set of 105 hypothetical adhesin and adhesin-like proteins of SEQ ID Nos. 280 to 384. 
One more embodiment of the present invention, wherein the invention also relates to a 
fully connected multilayer feed forward Artificial Neural Network based on the 
25 computational method as claimed in claim 1, comprising of an input layer, a hidden 
layer and an output layer which are connected in the said sequence, wherein each 
neuron is a binary digit number and is connected to each neuron of the subsequent layer 
for identifying adhesin or adhesin like proteins, wherein the program steps comprise:- 
[a] feeding a protein sequence in FASTA format; [b] processing the sequence 
30 obtained in step [a] through the 5 modules named A, C, D, H and M, wherein attribute 
A represents an amino acid composition, attribute C represents a charge composition, 
attribute D represents a dipeptide composition of the 20 dipeptides [NG, RE, TN, NT, 
GT, TT, DE, ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI and HR], attribute H 
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represents a hydrophobic composition and attribute M represents amino acid 
frequencies in multiplets to quantify 5 types of compositional attributes of the said 
protein sequence to obtain numerical input vectors respectively for each of the said 
attributes wherein the sum of numerical input vectors is 105; [c] processing of the 

5 numerical input vectors obtained in step [b] by the input neuron layer to obtain signals, 
wherein the number of neurons is equal to the number of numerical input vectors for 
each attribute; [d] processing of signals obtained from step [c] by the hidden layer to 
obtain synaptic weighted signals, wherein the optimal number of neurons in the hidden 
layer was determined through experimentation for minimizing the error at the best 

10 epoch for each network individually; [e] delivering synaptic weighted signals obtained 
from step [d] to the output layer for assigning of a probability value for each protein 
sequence fed in step [a] as being an adhesin by each network module; [f] using the 
individual probabilities obtained from step [e] for computing the final probability of a 
protein sequence being an adhesin denoted by the P ad value, which is a weighted 

15 average of the individual probabilities obtained from step [e] and the associated fraction 
of correlation which is a measure of the strength of the prediction. 

In still another embodiment of the present invention, wherein the input neuron layer 
consists of a total of 105 neurons corresponding to 105 compositional properties. 
In still another embodiment of the present invention, wherein the hidden layer 
20 comprises of neurons represented as 30 for amino acid frequencies, 28 for multiplet 
frequencies, 28 for dipeptide frequencies, 30 for charge composition and 30 for 
hydrophobic composition. 

In still another embodiment of the present invention, wherein the output layer 
comprises of neurons to deliver the output values as probability value for each protein 
25 sequence. 

Identification of novel adhesins and their characterization are important for studying 
host-pathogen interactions and testing new vaccine formulations. We have employed 
Artificial Neural Networks to develop an algorithm SPAAN (Software for Prediction of 
Adhesin and Adhesin-like proteins using Neural Networks) that can identify adhesin 
30 proteins using 105 compositional properties of a protein sequence. SPAAN could 
correctly predict well characterized adhesins from several bacterial species and strains. 
SPAAN showed 89% sensitivity and 100% specificity in a test data set that did not 
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contain proteins in the training set. Putative adhesins identified by the software can 
serve as potential preventive therapeutics. 

The present invention provides a novel computational method for identifying adhesin 
and adhesin-like proteins of therapeutic potential. More particularly, the present 
5 invention relates to candidate genes for these adhesins. The invention further provides 
new leads for development of candidate genes, and their encoded proteins in their 
functional relevance to preventive approaches. This computational method involves 
calculation of several sequence attributes and their subsequent analyses lead to the 
identification of adhesin proteins in different pathogens. Thus, the present invention is 

10 useful for identification of the adhesin proteins in pathogenic organisms. The adhesin 
proteins from different genomes constitute a set of candidates for functional 
characterization through targeted gene disruption, microarrays and proteomics. Further, 
these proteins constitute a set of candidates for further testing in development of 
preventive therapeutics. Also, are provided the genes encoding the candidate adhesin 

15 proteins. 

The present method offers novelty in the principles used and the power of Neural 
Networks to identify new adhesins compared to laborious and time consuming 
conventional methods. The present method is based on compositional properties of 
proteins instead of sequence alignments. Therefore this method has the ability to 

20 identify adhesin and adhesin like proteins from bacteria belonging to a wide 
phylogenetic spectrum. The predictions made from this method are readily verifiable 
through independent analysis and experimentation. The invention has the potential to 
accelerate the development of new preventive therapeutics, which currently requires 
high investment in terms of requirement of skilled labor and valuable time. 

25 The present invention relates to a computational method for the identification of 
candidate adhesin proteins of therapeutic potential. The invention particularly describes 
a novel method to identify adhesin proteins in different genomes of pathogens. These 
adhesin proteins can be used for developing preventive therapeutics. 
Accordingly, a computational method for identifying adhesin and adhesin-like proteins 

30 of therapeutic potential which comprises calculation of 105 compositional properties 
under the five sequence attributes, namely, Amino Acid frequency, Multiplet 
frequency, dipeptide frequency, charge composition and hydrophobic composition; and 
then training Artificial Neural Network (ANN, Feed Forward Error Back Propagation) 
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using these properties for differentiating between adhesin and non-adhesin class of 
proteins. This computational method involves quantifying 105 compositional attributes 
of query proteins and qualifying them as adhesins or non-adhesins by a P ad value 
(Probability of being an adhesin). The present invention is useful for identification of 
5 adhesin and adhesin-like proteins in pathogenic organisms. These newly identified 
adhesin and adhesin-like proteins constitute a set of candidates for development of new 
preventive therapeutics that can be tested in suitable experimental model systems 
readily. In addition, the genes encoding the candidate adhesin and adhesin-like proteins 
are provided. 

10 The invention provides a set of candidate adhesin and adhesin-like proteins and their 
coding genes for further evaluation as preventive therapeutics. The method of invention 
is based on the analysis of protein sequence attributes instead of sequence patterns 
classified to functional domains. Present method is less dependent on sequence 
relationships and therefore offers the potential power of identifying adhesins from 

15 distantly related organisms. The invention provides a computational method, which 
involves prediction of adhesin and adhesin-like proteins using Artificial Neural 
Networks. The proteins termed adhesin were found to be predicted with a high 
probability (P a d 0.51) in various pathogens. Some adhesin sequences turned out to be 
identical or homologous to proteins that are antigenic or implicated in virulence. By 

20 this approach, proteins could be identified and short-listed for further testing in 
development of new vaccine formulations to eliminate diseases caused by various 
pathogenic organisms. 
DESCRIPTION OF TABLES 
Table 1: Output file format given by SPAAN. 

25 Table 2: Organism Name, Accession number, Number of base pairs, Date of release 
and Total number of proteins. 

Table 3. Prediction of well characterized adhesins from various bacterial pathogens 
using SPAAN. 

Table 4. Analysis of predictions made by SPAAN on genome scans of a few selected 
30 pathogenic organisms. 

Table 5: GI numbers and Gene IDs of new putative adhesins predicted by SPAAN in 
the genomes listed in Table 2. 
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Table 6: GI numbers and Gene IDs of hypothetical proteins predicted as putative 
adhesins by SPAAN in the genomes listed in Table 2. 
Table 7: The list of 198 adhesins found in bacteria 
Brief description of the accompanying drawings 
5 Figure 1 shows the Neural Network architecture 

Figure 2 shows assessment of SPAAN using defined test dataset. 

Figure 3 (») shows Histogram plots of the number of proteins in the v*rioiis P ad value 
ranges are shown, (b) Pairwise sequence relationships among the adhesins were 
determined using CLUSTAL W and plotted on X-axis. Higher scores indicate similar 
10 pairs, (c) plot for non-adhesins. Data are plotted in the 4 quadrant format for clear 
inspection. 

Software program was written in C Language and operated on Red Hat Linux 8.0 
operating system. The computer program accepts input protein sequences in FastA 
format and produces a tabulated output. The output Table contains one row for each 

15 protein listing the probability outputs of each of the five modules, a weighted average 
probability of these five modules (P a dX and the function of the protein as described in 
the input sequence file. This software is called SPAAN (A Software for Prediction of 
Adhesins and Adhesin-like proteins using Neural Networks) and a software copyright 
has been filed. Although this software has multiple modules, the running of these 

20 modules have been integrated and automated. The user only needs to run one 
command. 

AAcompo.c: 

Input: File containing protein sequences in the fasta format. 

Output: File containing frequencies of all 20 AAs for each protein in one row. 

25 charge.c: 

Input: File containing protein sequences in the fasta format. 

Output: File containing frequency of charged amino acids (R, K, E and D) and 

moments (up to 18th order) of the positions of charged amino acids. 

hdr.c: 

30 Input: File containing protein sequences in the fasta format. 
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Output: File containing frequencies of 5 groups of amino acids formed on the 
basis their Hydrophobicity and moments of their positions up to 5th order. 

multiplets.c: 

Input: File containing protein sequences in the fasta format. 
5 Output: File containing fractions of multiplets of each of the 20 amino acids. 

querydfpep.c; 

Input: File.l containing protein sequences in the fasta format. 

File.2 containing list of the significant dipeptides in dipeptide analysis. 
Output: File containing frequencies of the dipeptides listed in the input File.2 
1 0 for each protein in the input File. 1 . 

train, c: 

Input: File containing following specifications - 

1 . Number of input and output parameters. 

2. Number of nodes in the hidden layers. 

15 3. Names of the training, validate and test data files. 

4. Learning rate, coefficient of moment. 

5. Maximum number of cycles for training. 
Output: Outputs are as follows. 

1 . Output of the trained NN for the test data set. 
20 2. Values of the weight connections in the trained NN. 

3. Some extra information about training. 

recognizee: 

Input: File containing following specifications — 



1 . Number of input and output parameters. 

25 2. Number of nodes in the hidden layers. 

3 . Names of the query input file. 

4. Name of the file containing values of the weight connections for 

trained NN. 

5. Name of the output file. 



30 Output: Outputs for the query entries calculated by the trained NN. 
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standard. c: 

Input: File containing protein sequences in fasta format. 

Output: File containing protein sequences in fasta format with all the new line 
characters removed lying within a sequence. 

5 filter.c: 

Input: File containing protein sequences in fasta format. 

Output: File containing protein sequences from the input except those which 
are short in length (<50 AAs) and which contain any amino acid other than the 
20 known amino acids. 
10 The five attributes: 

Amino Acid frequencies 

Amino acid frequency £ = (counts of ith amino acid in the sequence) / 1 ; i, = 1 . . .20, 1 is 
the length of the protein. 
Multiplet frequency 

15 Multiplets are defined as homopolymeric stretches (X) n where X is any of the 20 amino 
acids and n is an integer > 2. After identifying all the multiplets, the frequencies of the 
amino acids in the multiplets were computed as 
fi(m) = (counts of i th amino acid occurring as multiplet) / 1 
Dipeptide frequencies 

20 The frequency of a dipeptide (i, j) fij = (counts of ij th dipeptide) / (total dipeptide 
counts); i, j ranges from 1 to 20. 

It has been found that dipeptide repeats in proteins are important for functional 
expression of the clumping factor present on Staphylococcus aureus cell surface that 
binds to fibrinogen (Hartford et al 1999). Thus we included the dipeptide frequency 

25 module. The total number of dipeptides is 400. For optimal training of Neural Network, 
the ratio of total number of input vectors to the total number of weight connections 
must be around 2 to avoid over fitting (Andrea et at). Therefore, we identified the 
dipeptides whose frequencies in the adhesin data set (469 proteins, see database 
construction) were significantly different from that in the non-adhesin dataset (703 

30 proteins) using t-test. The frequencies of top 20 dipeptides (when arranged in the 
descending order of the p-values of t-test\ were fed to the Neural Network. These 
dipeptides were (using single letter IUPAC-IUB code) NG, RE, TN, NT, GT, TT, DE, 
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ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI, AND HR. With frequency inputs 
for 20 dipeptides and 28 neurons in the 2nd layer, the total number of weight 
connections is 588, and is in keeping with the criterion of avoiding over fitting. 
Charge composition 

5 The input frequency of charged amino acids (R, K, E and D considering the ionization 
properties of the side chains at pH 7.2) given by f c = (counts of charged amino acids) / 1 
Further, information on the characteristics of the distribution of the charged *mino 
acids in a given protein sequence was provided by computing the moments of the 
positions of the occurrences of the charged amino acids. Since moments characterize 
10 the patterns of distribution such as skewness and kurtosis (sharpness of the peak) we 
have used them to represent the distribution patterns of the charged residues in the 
sequence. 

The general expression to compute moments of a given order; say 'i r is 
M r = r th order moment of the positions of charged amino acids 

15 = y vjj ~ X m ) r 

^ N 

Where, X m = mean of all positions of charged amino acids 
Xi - position of i th charged amino acid 
N = number of charged amino acids in the sequence 

The moments 2 nd to 19 th order were used to train the ANN constituting a total 20 inputs 
20 in addition to frequency of charged amino acids and the length of the protein. The 

upper limit of 19 th order was set based on assessments of sensitivity and specificity on a 

small dataset of adhesins and non-adhesins. Moments of order greater than 19 were not 

useful in improvement of performance. 

Hydrophobic composition 
25 A given protein sequence was digitally transformed using the hydrophobic scores of the 

amino acids according to Brendel et ah (43). The scores for five groups of amino acids: 

(-8 for K, E, D, R), (-4 for S, T, N, Q), (-2 for P, H), (+1 for A, G, Y, C, W), (+2 for L, 

V, I, F, M). 

Following inputs were given for each of the group 
30 (a) fj = (counts of i th group) / (total counts in the protein); i ranges from 1 to 5 

(b) mji = j th order moment of positions of amino acids in i th group; j ranges from 2 to 5. 
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A total of 25 inputs representing the hydrophobic composition of a protein were fed to 
the Neural Network. The rationale for using moments was same as described in the 
section on charge composition inputs. 

Taken together a total of 105 compositional properties of a given protein sequence were 
5 used to predict their adhesin characteristics. 

The software PropSearch uses 144 compositional properties of protein sequences to 
detect possible structural or functional relationships between a new sequence and 
sequences in the database (Hobohm and Sander 1995). The approach defines protein 
sequence dissimilarity (or distance) as a weighted sum of differences of compositional 

10 properties such as singlet and doublet amino acid composition, molecular weight, 
isoelectric point (protein property search or PropSearch). Compositional properties of 
proteins have also been used for predicting secretory proteins in bacteria and apicoplast 
targeted proteins in Plasmodium falciparum (Zuegge, et al. 2001). The properties used 
here are statistical methods, principal component analysis, self-organizing maps, and 

15 supervised neural networks. In SPAAN, we have used 105 compositional properties in 
the five modules viz. Amino Acid frequencies, Multiplet frequencies, Dipeptide 
frequencies, Charge composition, Hydrophobic composition. The total of 105 
properties used in SPAAN are 20 for Amino acid frequencies, 20 for Multiplets 
frequencies, 20 for Dipeptide frequencies (Top 20 significant dipeptides are used, based 

20 on t-tesi), 20 for Charge composition (frequency of charged amino acids (R, K, E and 
D) and moments of 2nd to 19th order), and 25 for Hydrophobic composition (Amino 
acids were classified into five groups (-8 for K, E, D, R), (-4 for S, T, N, Q), (-2 for P, 
H), (+1 for A, G, Y, C, W), (+2 for L, V, I, F, M). A total of 25 inputs consisted of the 
following: Frequency of each group, Moments of positions of amino acids in each 

25 group from 2nd to 5th order. 
Neural Network 

A feed forward error back propagation Neural Network was used. The program is a 
kind gift from Charles W. Anderson, Department of Computer Science, Colorado State 
University, Fort Collins, CO 80523, anderson@cs.colostate.edu 
30 Neural Network architecture 

The Neural Network used here has a multi-layer feed-forward topology. It consists of 
an input layer, one hidden layer and an output layer. This is a c fully-connected 5 Neural 
Network where each neuron i is connected to each unit j of the next layer (Figure 1). 
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The weight of each connection is denoted by Wy. The state li of each neuron in the input 
layer is assigned directly from the input data, whereas the states of hidden layer 
neurons are computed by the sigmoid function, 
hj = 1 / (1 + exp -(wjo + wy 10), 
5 where, w j0 is the bias weight 

The back propagation algorithm was used to minimize the differences between the 
computed output and the desired output. Ten thousand cycles (epochs) of iterations are 
performed. Subsequently, the best epoch with minimum error was identified. At this 
point the network produces approximate target values for a given input in the training 
10 set. 

A network was trained optimally for each attribute. Thus five networks were prepared. 
The schematic diagram (Figure 1) shows the procedure adopted. The number of 
neurons in the input layer was equal to the number of input data points for each 
attribute (for example 20 neurons for 20 numerical input vectors of the amino acid 
15 composition attribute). The optimal number of neurons in the hidden layer was 
determined through experimentation for minimizing the error at the best epoch for each 
network individually. An upper limit for the total number of weight connections was set 
to half of the total number of input vectors to avoid over fitting as suggested previously 
(Andrea et at). 

20 Computer programs to compute individual compositional attributes were written in C 
and executed on a PC under Red Hat Linux ver 7.3 or 8.0. The network was trained on 
the training set, checks error and optimizes using the validate set through back 
propagation. The validate set was different from the training set. Since, the number of 
well annotated adhesins were not many, we used the 'validate set' itself as test set for 

25 preliminary evaluation of the performance and to obtain the fraction of correlation to 
compute the weighted average probability (P a d value) described in the next section. The 
training set had 367 adhesins and 580 non-adhesins. The validate set had 102 adhesins 
and 123 non-adhesins. The adhesins were qualified with a digit '1' and the non-adhesins 
were qualified with a digit '0 1 . 

30 During predictions, the network is fed with new data from the sequences that were not 
part of training set. Each network assigns a probability value of being an adhesin to a 
given sequence. The final probability is computed as described in the next section. 
Probability of being an adhesin, the P a d value 
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Query proteins are processed modularly through network trained for each attribute. 
Thus, five probability outputs are obtained. Final prediction was computed using the 
following expression which is a weighted linear sum of the probabilities from five 
modules: 

5 

p ^ = (P A *fc A +P C */c c + P D *fc D +P H *fc H +P M *Jc M ) 
ifc A +fc c +fc D +fc H + fc M ) 

Pi = Probability from i module, 

fci = fraction of correlation of i module of the trained Neural Network, 
Where i = A (Amino acid frequencies), C (Charge composition), D (Dipeptide 
10 frequencies), H (Hydrophobic composition), or M (Multiplet frequencies). 

The fraction of correlation fci represents the fraction of total entries that were correctly 
predicted (Pi, a dhesin > 0.5 and Pi, non-adhesin < 0.5) by the trained network on the test set 
used in preliminary evaluation (Charles Anderson). 
Neural Network 

15 A feed forward error back propagation Neural Network was used. The program was 
downloaded from the web site with permission from the author, Charles W. Anderson, 
Department of Computer Science, Colorado State University, Fort Collins, CO 80523, 
anderson@cs.colostate.edu 
Statistical Analysis 

20 All statistical procedures were carried out using Microsoft Excel (Microsoft 
Corporation Inc. USA). 
Sequence analysis 

Homology analysis was carried out using CLUSTAL W (Thompson et al 1994), 
BLAST (Altschul et al 1990), CDD (conserved domain database) search (Marchler- 

25 Bauer et al 2002). 

The whole genome sequences of microbial pathogens present new opportunities for the 
development of clinical applications such as diagnostics and vaccines. The present 
invention provides new leads for the development of candidate genes, and their 
encoded proteins in their functional relevance to preventive therapeutics. 

30 The protein sequences of both the classes, i.e. adhesin and non-adhesin, were 
downloaded from the existing database (National Centre for Biotechnology Information 
(NCBI), USA). A total of 105 compositional properties under the five sequence 
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attributes namely, amino acid composition, multiplet composition, dipeptide 
composition, charge composition and hydrophobic composition were computed by 
computer programs written in C language. The attributes were computed for all the 
proteins in both the databases. The sequence-based attributes were then used to train 

5 Artificial Neural Network for each of the protein attributes. Adhesins were qualified by 
the digit T and non-adhesins were qualified by the digit 6 0\ Finally each trained 
Artificial Neural Network was used to identify potential adhesins which can be 
envisaged to be useful for the development of preventive therapeutics against 
pathogenic infections. Accordingly, the invention provides a computational method for 

10 identifying adhesin and adhesin-like proteins of therapeutic potential, which comprises: 

1. preparing two comprehensive data-sets of adhesin and non-adhesin proteins from 
publicly available information on protein sequences, 

2. calculating computationally the sequence based attributes of the protein sequences in 
Hie publicly available protein datasets using specially developed Software for 

15 Prediction of Adhesins and Adhesin-like proteins using Neural Networks (SPAAN), 

3. training the Artificial Neural Network (ANN) for the selected attributes, 

4. assigning probability value suitable for an adhesin, "P ad " to the query protein and 
identifying adhesin like property in the query proteins with the help of trained Artificial 
Neural Network implemented in SPAAN, 

20 5. validating computationally the protein sequences as therapeutic potentials by 
comparing with the known protein sequences that are biochemically characterized in 
the pathogen genome. 

In an embodiment of the invention the protein sequence data may be taken from an 
organism, specifically but not limited to organisms such as Escherichia coli, 

25 Haemophilus influenzae, Helicobacter pylori, Mycoplasma pneumoniae, 
Mycobacterium tuberculosis, Rickettsiae prowazekii, Porphyromonas gingivalis, 
Shigella flexneri, Streptococcus mutans, Streptococcus pneumoniae, Neisseria 
meningitides, Streptococcus pyogenes, Treponema pallidum, Severe Acute Respiratory 
Syndrome associated coronavirus. 

30 In another embodiment to the present invention different sequence-based attributes 
used for identification of proteins of therapeutic potential, comprise amino acid 
composition, charge composition, hydrophobicity composition, multiplets frequencies, 
and dipeptide frequencies. 
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In an embodiment, the non-homologous adhesin protein sequence may be compared 
with that of known sequences of therapeutic applications in the selected pathogens. 
In an embodiment of the invention, the sequences of adhesin or adhesin like proteins 
comprise sequences of sequences IDs listed in Tables 5 and 6 identified by the method 
5 of invention. 

.Another embodiment of the invention the computer system comprises a central 
processing unit, executing SPAAN program, giving probabilities based on different 
attributes using Artificial Neural Network and in built other programs of assessing 
attributes, all stored in a memory device accessed by CPU, a display on which the 
10 central processing unit displays the screens of the above mentioned programs in 
response to user inputs; and a user interface device. 

In One embodiment of the present invention, the particulars of the organisms such as 
their name, strain, accession number in NCBI database and other details are given in 
Table 2: 

15 The invention is further explained with the help of the following examples, which are 
given by illustration and should be construed to limit the scope of the present invention 
in any manner. 
Example 1 
Operating SPAAN: 

20 The purpose of the program is to computationally calculate various sequence-based 
attributes of the protein sequences. 
The program works as follows: 

The internet downloaded FASTA format files obtained from 
http://www.ncbi.nlm.nih.gov were saved by the name <organism_name>.faa are 
25 converted in the standard format by C program and passed as input to another set of C 
programs which computes the 5 different attributes of protein sequences (a total of 105 
compositional properties in all 5 modules). 

The computed properties were fed as input to the 5 different Neural Networks. Each 
trained network assigns a probability value of being an adhesin for a query protein. The 
30 final probability (P ad ) was calculated as weighted average of these five individual 
probabilities. The weights were determined from a correlation value of correct 
prediction during test runs of each of the five modules. 
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Input/Output format: 
Downloaded Files and their format: 

<organism_name>.faa: file which stores the annotation and the protein sequence. 
Input file Format: FASTA 
5 ">gi.vertline. ,f <annotation> 
For example, 

>gi. vertline.23 1 4605 .vertline.gb. vertline. AAD08472 .vertline.histidine and glutamine- 
rich protein 

MAHHEQQQQQQANSQHHHHHHHAHHHHYYGGEHHHHNAQQHAEQQAEQQ 

10 AQQQQQQQAHQQQQQKAQQQNQQY 

>gi. vertline.326 1 822. vertline.gnl .vertline.PID.vertline.e328405 PE_PGRS 
MIGDGANGGPGQPGGPGGLLYGNGGHGGAGAAGQDRGAGNSAGLIGNGGAG 
GAGGNGGIGGAGAPGGLGGDGGKGGFADEFTGGFAQGGRGGFGGNGNTGAS 
GGMGGAGGAGGAGGAGGLLIGDGGAGGAGGIGGAGGVGGGGGAGGTGGGG 

15 VASAFGGGNAFGGRGGDGGDGGDGGTGGAGGARGAGGAGGAGGWLSGHSG 
AHGAMGSGGEGGAGGGGGARGEAGAGGGTSTGTNPGKAGAPGTQGDSGDP 
GPPG 

>gi. vertline.. . . 

Table 1 : Output file format given by SPAAN 
20 <organism_name> . out 



SN 


Pa 


Pc 


Pd 


Ph 


Pm 


P ad ~value 


Protein Name 


1 


0.05683 


0.290803 


0.441338 


0.50304 


0.029503 


0.260485 


>gi.vertline.32454344.vert 
line. gb. vertline. AAP82966 
.1. 

vertline . orfl a polyprotein 
[SARS coronavirus Hong 
Kong ZY-2003] 


2 


0.639235 


0.166721 


0.054583 


0.935385 


0453498 


0.462452 


>gi.vertline.32454345.vert 
line.gb.vertline.AAP82967 
.1. 

vertline.orflab polyprotein 
[SARS coronavirus Hong 
Kong ZY-2003] 


3 


0.65111 
1 


0.91150 
4 


0.43869 
6 


0.54394 
4 


0.92404 
4 


0.690247 


>gi.vertline.32454346.vert 
line.gb.vertline.AAP82968 
.1. 

vertline.spike glycoprotein 
[SARS coronavirus Hong 
Kong ZY-2003] 
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>gLvertiine.32454347.vert 
line.gb.vertline.AAP82969 
.1. 

vertline.Orfia [SARS 
coronavirus Hong Kong 
ZY-2003] 


4 


0.464324 


0.655003 


0.179503 


0.008700 


0.241573 


0.300970 



Where Pa, Pc, Pd, Ph ? Pm are the outputs of the five Neural Networks. 
Example 2 organisms and sequence numbers 

Table 2: Organism Name, Accession number, Number of base pairs, Date of release 



5 and Total number of proteins analyzed 



Organism Name 


Accession 
Number 


Number of 
base pairs 


Date of release 


Total no. 
of proteins 


E. coli 0157 H7 


NC 0026 
95 


5498450 


7-Mar-2001 


5361 


H. influenzae Rd 


NC 0009 
07 


1830138 


30-Sep-1996 


1709 


H. pylori J99 


NC 0009 
21 


1643831 


10-Sep-2001 


1491 


M. pneumoniae 


NC 0009 
12 


816394 


2-Apr-2001 


689 


M. tuberculosis H37Rv 


NC 0009 
62 


4411529 


7-Sep-2001 


3927 


R. prowazekii strain 
Madrid E 


NC 0009 
63 


1111523 


10-Sep-2001 


835 


P. gingivalis W83 


NC 0029 
50 


2343476 


9-Sep-2003 


1909 


S. flexneri 2a str. 2457T 


NC 0047 
41 


4599354 


23- Apr-2003 


4072 


S. mutatis UA159 


NC 0043 
50 


2030921 


25-Oct-2002 


1960 


S. pneumoniae R6 


NC 0030 
98 


2038615 


6-Sep-2001 


2043 


N. meningitidis 
serogroup A strain 
Z2491 


NC 0031 
16 


2184406 


27-Sep-2001 


2065 


S. pyogenes MGAS8232 


NC 0034 
85 


1895017 


Jan 31, 2002 


1845 


T. pallidum subsp. 
pallidum str. Nichols 


NC 0009 
19 


1138011 


7-Sep-2001 


1036 


Severe Acute 
Respiratory Syndrome 
(SARS) associated 
coronavirus Frankfurt 1 


AY29131 
5 


29727 


ll-JUN-2003 


14 


SARS coronavirus HSR 
1 


AY32397 
7 


29751 


15-OCT-2003 


14 
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SARS coronavirus Z JO 1 


AY29702 
8 


29715 


19-MAY-2003 


3 


SARS coronavirus TW1 


AY29145 • 
1 


29729 


14-MAY-2003 


11 


SARS coronavirus 
CUHK-SulO 


AY28275 
2 


29736 


07-MAY-2003 


4 


SARS coronavirus 
Urbani 


AY27874 
1 


29727 


12-AUG-2003 


12 


SARS coronavirus 


NC 0047 
18 


29751 


9-Sep-2003 


29 


SARS coronavirus Tor2 


AY27411 
9 


29751 


16-MAY-2003 


15 


SARS coronavirus GD01 


AY27848 
9 


29757 


18-AUG-2003 


12 


SARS coronavirus 
CUHK-W1 


AY27855 
4 


29736 


31-JUL-2003 


11 


SARS coronavirus BJ01 


AY27848 
8 


29725 


01-MAY-2003 


11 



Example 3 

The multi-layered feed forward Neural Network architecture implemented in SPAAN 
(figure 1). A given protein sequence in FASTA format is first processed through the 5 
5 modules A, C, D, H, and M to quantify the five types of compositional attributes. A: 
Amino acid composition, C: Charge composition, D: dipeptide composition of the 20 
dipeptides (NG, RE, TN, NT, GT, TT, DE, ER, RR, RK, RI, AT, TS ? IV, SG, GS, TG, 
GN, VI, HR), H: Hydrophobic composition, M: Amino acid frequencies as Multiplets. 
The sequence shown is part of the FimH precursor (gi 5524634) of E. coli. 

10 Subsequently, these numerical data are input to the input neuron layer. The directions 
of arrows show data flow. The number of neurons chosen in the input layer was equal 
to the number of the numerical input vectors of each module. The network was 
optimally trained through minimization of error of detection based on validate set 
through back propagation. The details are described in the methods. Each network 

15 module assigns a probability value of the protein being an adhesin based on the 
corresponding attribute. The final probability of a protein sequence being an adhesin is 
the P a d value a weighted average of the individual probabilities and the associated 
fraction of correlation which is a measure of the strength of the prediction. 
Example 4 

20 Performance of SPAAN assessed using a test set of 37 adhesins and 37 non-adhesins 
that were not part of the training set. Matthew's correlation coefficient (Mcc, plotted on 
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10 



15 



20 



25 



Y-axis) for all the proteins with P ad values above a given threshold (plotted on X-axis) 

(figure 2). The Matthew's correlation is defined as: 

Mcc ^ (TP*TN)-(FP*FN) 

t](7N + FN)(TN + FP)(TP + FN)(TP + FP) 

Where TP = True Positives, TN = True Negatives, FP = False Positives, FN = False 
Negatives. 

Here TPs are adhesins* TNs are non-adhesins. In general, adhesins have high P ad value, 
whereas non-adhesins have low P a d value. Thus known adhesins with P a d value above a 
given threshold are true positives whereas known non-adhesins with P a d value below 

TP 



the given threshold are true negatives. The sensitivity, Sn is given by 



and 



TP + FN d 
( TP \ 

specificity, Sp is given by I jj> + j?p \ m False negatives are those cases, wherein a 

known adhesin had P a d value lower than the chosen threshold. Similarly, a known non- 
adhesin with a P a d value higher than the chosen threshold was taken as false positive. A 
theoretical polynomial curve of second order (dashed line) was fitted to the observed 
curve (smooth line) with a Karl-Pearson correlation coefficient R 2 = 0.9799. The 
maximum point of the theoretical curve (where first derivative vanishes and second 
derivative is negative) was chosen as reference (vertical dotted line) to identify the 
maximum Mcc = 0.94 on the observed curve (shown by arrow). The corresponding P a d 
value threshold was 0.51. At this P a d value threshold, Sn and Sp were 0.89 and 1.0 
respectively. Note that the Mcc does not drop down to the x-axis because the highest 
P ad value attained by adhesins was 0.939 in comparison to the theoretical attainable 
limit of l.O. 
Example 5 

Assessment of SPAAN on well known adhesins from various bacterial pathogens. 
Table 3. Prediction of well characterized adhesins from various bacterial pathogens 
using SPAAN. 



Species 


Disease 
caused 


Adhesin a 


Host ligand 


Pad value b 
(Range) 


E. coli 


Diarrhoea 


PapG (27) 
SfaS (5) 


a-D-gal(l-4) P-D-Gal- 
containing receptors 
alpha-sialyl-beta-2,3-b- 
galactose 


0.84-0.76 
0.94-0.94 


FimH (63) 


D-mannosides 


0.96-0.23° 
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Intimin (12) 


tvrosine-Dhosnhorvlated 
form of host cell receptor 
Hp90 


0.95-0.78 


PrsG (5) 


Gal(alphal-4)Gal 


0.86-0.85 


Nontypeable H. 

influenzae 


Influenza 


HMW1, 
HMW2 


Human epithelial cells 


0.97 


Hia (8) 


human conjuctival cells 


0.93-0.90 


H. influenzae 


bacterial 
meningitis d 


HifE (18) 


Sialylyganglioside-GM 1 


0.85-0.73 


K. pneumoniae 


Pneumonia 


MrkD 


type V coll? gen 


0.8? 
0.85 


B. pertussis 


Whooping 
cough 


FHA 


Sulphated sugars on cell- 
surface glycoconjugates 


Pertactin 


Integrins 


0.43 


Y. enterocolitica 


Enterocolitis 


YadA (5) 


Pi integrins 


0.88-0.79 


S. mutans 


Dental 
Caries 


SpaP (2) 
PAc 


Salivary glycoprotein 
Salivary glycoprotein 


0.88, 0.87 
0.88 


Streptococcus 
gordonii 


Oral cavity 


SspA (2) 


Salivary glycoprotein 


0.85,0.84 


CshA 


Fibronectin 


0.78 


CshB 


Fibronectin 


0.63 


ScaA 


Co-aggregation 


0.71 


SspB (2) 


Salivary glycoprotein 


0.85,0.84 


Streptococcus 
sobrinus 


Tooth decay 


SpaA 
PAg(2) 


Salivary glycoprotein 
Salivary glycoprotein 


0.89 

0.89, 0.73 


Streptococcus 
pyogenes 


Scarlet 
Fever 


Protein F 


Fibronectin 


0.49 


Streptococcus 
pneumoniae 


Bacterial 
Pneumonia 


PsaA (5) 


Human nasopharyngeal 
cells 


0.82-0.78 


CbpA e / 
SpsA / 
PbcA/ PspC 


phosphorylcholine of tbe 
teichoic acid. 


0.81-0.49 


Streptococcus 
paras unguis 


Valve 

endocarditis 


FimA 


Salivary glycoprotein fibrin 


0.76 


Streptococcus 
sanguis 


Tooth Decay 


SsaB 


Salivary glycoprotein 


0.71 


Enter o coccus 
faecalis 


Empyma in 
patients with 
liver disease 


EfaA 


Unknown 


0.83 


Staphylococcus 
aureus 


Food 
Poisoning 


FnbA 
FnbB (3) 


Fibronectin 
Fibronectin 


0.8 

0.78, 0.77, 
0.69 


Helicobacter 
pylori 


Peptic 
Ulcers 


BabA(17) 


difucosylated Lewis" blood 
group antigen 


0.87-0.68 



a : The number of sequences from different strains and homologs from related species 
analyzed are shown in parantheses. 
b : Rounded off to the second decimal. 
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c : Out of 63 FimH proteins, 54 were from E. coli 9 6 from Shigella flexineri, 2 from 
Salmonella enterica and 1 was from Salmonella typhimurium. Except 2 FimH proteins, 
the rest had P ad 0.51 . The 2 exceptions (gi numbers: 5524636, 1778448) were from E. 
coli. The gi:5524636 protein is annotated as a FimH precursor but is much shorter (129 
5 amino acids) than other members of the family. The gi: 1778448 protein is a S. 
typhimurium homolog in E. coli. 

d : Other ailments include pneumonia, epiglottitis, osteomyelitis, septic arthritis and 
sepsis in infants and older children. 

e : The adhesin CbpA is also known by alternative names SpsA, PbcA and PspC. A total 
10 of seven sequences were analyzed. Except 1 PspC sequence, the rest all had P a d 0.5 1 . 
Example 6 

Ability of SPAAN to discriminate adhesins from non-adhesins at Pad 0.51 (figure 3- 
a). 

Example 7 

15 The non-homology character of SPAAN assesses in both adhesins and non-adhesins 
(figure 3b and 3c). 

Figure 3 (a - c). SPAANT is non-homology based software. A total of 130 adhesins and 
130 non-adhesins were analyzed to assess whether the predictive power of SPAAN 
could be influenced by the sequence relationships, (a) Histogram plots of the number of 

20 proteins in the various P a d value ranges are shown. Shaded bars represent adhesins 
whereas open bars represent non-adhesins. Note the SPAAN' s ability to segregate 
adhesins and non-adhesins into two distinct cohesive groups, (b) Pairwise sequence 
relationships among the adhesins were determined using CLUSTAL W and plotted on 
X-axis. Higher scores indicate similar pairs. The corresponding differences in Pad 

25 values in the same protein pair was plotted on the Y-axis. Each point in the diagram 
represents a pair. Arrow points to protein pairs of the FimH family with high AP ad 
values in spite of high similarity: Since one of the FimH proteins (gi: 5524636) had 
very low P a d value all pairs with this false negative protein show high AP ad values. The 
protein (gi: 5524636) is of much shorter length compared with other members of the 

30 same family, (c) plot for non-adhesins. Data are plotted in the 4 quadrant format for 
clear inspection. Note that among protein pairs with CLUSTAL W score < 20 the 
majority (82% in adhesins and 86% in non-adhesins) have AP ad < 0.2. These data 
support the non-homology character of SPAAN. 
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Example 8 

Genomescan of pathogens by SPAAN identifies well known adhesins and new 
adhesins and adhesin-like proteins 

Table 4. Analysis of predictions made by SPAAN on genome scans of a few selected 



5 pathogenic organisms 1 



Species 

Protein 
Class 


Escherichia coli 
0157:H7 


Mycobacterium 
tuberculosis H37Rv 


bAKo associated 
corona virus (11 
strains) 


Total number of 
proteins with P a d 0.51 


575 






jsjiown aoncbinb 


1 7 b I 






Putative proteins with 
adhesin like 
characteristics 


92 c 


105 J 




Hypothetical proteins 
with adhesin-like 
characteristics 


22 a 






Proteins likely to be 
extracytoplasmic or 
located at surface 


190 e 


191 K 


5 m 


Phage proteins 


30 f 






Others 


13 s 


6' 




Hypothetical proteins 


157 h 


86 h 




Wrong predictions 


54 1 


47' 





a : SPAAN has general applicability. The three pathogens chosen here are those in 

which intense investigations are being conducted presently. M tuberculosis is of 

special importance to developing countries. 
10 b : Fimbrial adhesins, AidA-I, gamma intimin, curlin, translocated intimin receptor, 

putative adhesin and transport, lha, prepilin peptidase dependent protein C. 

c : These proteins have been annotated as proteins with a putative function. These 

sequences were analyzed using CDD (Conserved domain database, NCBI) and BLAST 

searches. Adhesin like domains were found in these proteins. 
15 d : These proteins have been annotated as 'hypothetical'. These sequences were 

analyzed using CDD and BLAST searches. Adhesin like domains were found in these 

proteins. 
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e : These proteins are outer membrane, extracellular, transport, surface, exported, 
flagellar, periplasmic lipoprotein, and proteins annotated as 'hypothetical 5 but found to 
have similar functions listed here using BLAST and CDD searches. 
f : The phage proteins were of the following functional roles - tail fiber, head 

5 decoration, DNA injection, tail, major capsid, host specificity, endolysin. 

g : Proteins predicted by SPAAN but not readily classifiable into the classes listed here 
have been collectively grouped as 'Others'. However, some of these proteins are known 
to participate in host-pathogen interactions. The annotated functional roles are typelll 
secretion, antibiotic resistance, heat shock, acid shock, structural, tellurium resistance, 

10 terminase, Hep-like, Sec-independent translocase, uncharacterized nucleoprotein, 
HicB-like. 

h : These proteins have been annotated as hypothetical Re-analyses of these proteins 
using BLAST and CDD failed identify any function for these proteins. 
{ : These proteins have been annotated with functional roles that are very likely to occur 
15 within the cell. Hence these proteins may have remote possibility of functioning as 
adhesins or adhesin-like proteins. Therefore this set of proteins have been incorrectly 
predicted as adhesins or adhesin-like by SPAAN. 

J : These proteins are PE_PGRS, PE proteins. Several reports (for example Brennan et 
al) indicate that PE_PGRS proteins may be localized to cell surface and aid in host- 

20 pathogen interaction. 

k : Lipoproteins (lpp, lpq, lpr), PPE, outer membrane, surface, transport, secreted, 
periplasmic, extracellular, ESAT-6, peptidoglycan binding, exported, mpt (with 
extracellular domains), and proteins annotated as 'hypothetical' but found to have 
similar functions listed here using BLAST and CDD searches. 

25 l : These proteins were of the following functions - glutaredoxin-like thioltransferase, 
putative involvement in molybdate uptake, ATP synthase chain, sulphotransferases, 
S.erythraea rhodanese-like protein M29612|SERCYSA_5, unknown function. 
m : These proteins were the spike glycoprotein with antigenic properties, and nsp2, nsp5, 
nsp6 and nsp7. 
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Table 5: New putative adhesins predicted by SPAAN in the genomes listed in table 
2- 

(Total number = 279) 

Protein GI Gene ID Protein name 
Number 

Escherichia coli 0157:H7 

13360742 912619 hemagglutinin/hemolysin-related protein 

putative ATP-binding component of a transport system 
putative tail fiber protein 

minor fimbrial subunit/D-mannose specific adhesin 
putative fimbrial-like protein 
AidA-I adhesin-like protein 
putative fimbrial protein 
putative invasin 
putative invasin 
Gamma intimin 

putative DNA transfer protein precursor 
putative fimbrial protein 
AidA-I adhesin-like protein 
putative fimbrial-like protein 
putative fimbrial-like protein 

putative ATP-binding component of a transport system 
putative fiagellin structural protein 
putative type 1 fimbrial protein precursor 
curlin major subunit CsgA 
translocated intimin receptor Tir 
putative major pilin protein 

putative ATP-binding component of a transport system and 
adhesin protein 

export and assembly outer membrane protein of type 1 
fimbriae 

homolog of Salmonella FimH protein 



13362986 

13361114 

13364757 

13362687 

13360856 

13364140 

13359793 

13364768 

13364034 

13362703 

13364141 

13359819 

13360480 

13362692 

13362585 

13359881 

13361579 

13360880 

13364036 

13360740 

13361582 



914770 

913228 

913676 

915687 

912599 

915374 

914435 

913650 

915471 

915668 

915376 

914463 

917768 

915681 

916824 

914526 

917311 

913991 

915465 

912615 

917317 



13364754 913683 



13360484 917767 
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13364751 913688 major type 1 subunit fimbrin 

13359597 913742 putative fimbrial protein 

13362550 916787 putative ATP-binding component of a transport system 

13359595 913739 putative fimbrial protein 

13359599 913748 probable outer membrane porin protein involved in fimbrial 

assembly 

1 3363900 9 ^ 5704 putative fimbrial protein precursor 

13361575 917307 putative fimbrial-like protein 
13364756 913678 fimbrial morphology 

1 3 3 63496 9 1 6 1 42 truncated putative fimbrial protein 

13359601 913761 putative fimbrial-like protein 

13364145 915368 putative type 1 fimbrial protein 

13363902 915708 putative outer membrane usher protein precursor 

13361576 917309 putative outer membrane protein 
13361013 913353 putative major tail subunit 
13364755 913682 fimbrial morphology 

133 6073 8 91 2793 putative outer membrane usher protein 

13363928 915608 alpha-amylase 

13363495 9 1 6 1 44 putative outer membrane protein 

13362383 916617 putative type-1 fimbrial protein 

133 643 73 9 1 4972 outer membrane vitamin B 1 2 receptor protein B tuB 

13360879 912479 minor curlin subunit precursor CsgB 

13360739 912756 putative chaperone protein 

13361574 917314 putative fimbrial-like protein 

13361 127 913212 outer membrane protease precursor 

13363210 91 6442 putative lipoprotein 

13361104 913238 major tail protein 

13361709 917446 putative major tail subunit 

13359725 914366 outer membrane pore protein PhoE 

13360875 913765 curli production assembly/transport component CsgF 

13362170 913927 putative outer membrane protein 

13361473 917203 putative BigB-like protein 



WO 2005/076010 



29 



PCT/IN2005/000037 



13364025 915286 EspF protein 

13360081 916982 outer membrane receptor for ferric enterobactin (enterochelin) 

and colicins B and D 

13362977 914779 hypothetical lipoprotein 

13360351 917632 outer membrane protein X 

13360696 914208 putative outer membrane precursor 

] 3361456 91 7206 putative outer membrane protein 

13361 626 9 1 7374 putative outer host membrane protein precursor 

13361698 917449 putative outer membrane protein 

133621 86 913421 putative outer membrane protein precursor 

13362697 915676 long-chain fatty acid transport protein FadL 

13360918 914188 flagellar hook protein FlgE 

13360737 912506 putative outer membrane protein 

13360342 917629 putative outer membrane receptor for iron transport 

13363396 916248 outer membrane channel TolC 

13361958 912705 putative scaffolding protein in the formation of a murein- 

synthesizing holoenzyme 

13359921 914566 nucleoside-specific channel-forming protein TSX 

133 60944 913890 outer membrane receptor for ferric iron uptake 

13359998 914644 putative outer membrane transport protein 

13363390 916251 putative ferrichrome iron receptor precursor 

1 3364227 915153 outer membrane phospholipase A 

13361982 91 2846 putative outer membrane protein 

13360129 917032 a minor lipoprotein 

13361817 91 2692 putative outer membrane protein 

13360233 917507 membrane spanning protein TolA 

13362837 91521 8 putative outer membrane lipoprotein 

13362328 912985 putative colanic acid biosynthesis glycosyl transferase 
Haemophilus influenzae Rd 

1 6272254 94952 1 prepilin peptidase-dependent protein D 

16272928 950762 immunoglobin Al protease 

1 6272 129 951 072 lipoprotein 
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16273251 


950616 


hemoglobin-binding protein 


30995429 


950130 


opacity protein 


16272854 


949634 


protective surface antigen D15 


16272283 


950648 


opacity associated protein 


16272604 


949701 


hemoglobin-binding protein 


Helicobacter pylori J99 




4155101 


889167 


putative vacuolating cytotoxin (VacA) paralog 


4154798 


890022 


putative vacuolating cytotoxin (VacA) paralog 

ST O J V J XT O 


4155426 


890036 


putative vacuolating cytotoxin (VacA) paralog 


4155390 


890075 


vacuolating cytotoxin 


4155400 


890058 


outer membrane protein - adhesin 


4155681 


889718 


putative Outer membrane protein 


4155420 


890042 


Outer membrane protein/porin 


4155775 


889799 


outer membrane protein - adhesin 


4155419 


890044 


Outer membrane protein/porin 


4154526 


889066 


putative Outer membrane protein 


4154724 


889419 


putative Outer membrane protein 


4155862 


890404 


putative Outer membrane protein 


4156048 


889958 


putative IRON(III) DICITRATE TRANSPORT PROTEIN 


4154510 


889297 


putative Outer membrane protein 


4155432 


889515 


putative outer membrane protein 


4155623 


889671 


putative Outer membrane protein 


4155700 


889739 


putative Outer membrane function 


4154740 


889426 


Outer membrane protein/porin 


4155692 


889743 


putative Outer membrane protein 


4155594 


889648 


putative outer membrane protein 


4155680 


889719 


putative Outer membrane protein 


4155217 


890243 


putative Outer membrane protein 


4155958 


889905 


putative Outer membrane protein 


4155201 


890259 


putative Outer membrane protein 


4155013 


889232 


cag island protein 


4154974 


889032 


putative Outer membrane protein 
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4155214 890244 putative Outer membrane protein 
4 1 54973 8 89042 Outer membrane protein 
4155344 8901 15 putative Outer membrane protein 
4155099 889160 FLAGELLIN A 
4155023 888978 cag island protein 

4155035 889201 cag island protein, CYTOTOXICITY ASSOCIATED 

IMMUNODOMINANT ANTIOEN 
4 1 55289 890 1 64 NEURAMINYLLACTOSE-BINDING HEMAGGLUTININ 

PRECURSOR 

Mycoplasma pneumoniae 

1350788 1 877207 involved in cytadherence 
13507880 877268 ADP1JVTYCPN adhesin PI 
1 3508228 8772 1 1 species specific lipoprotein 
13508181 877124 species specific lipoprotein 

13508179 877071 Mollicute specific lipoprotein, MG307 homolog, from M. 

genitalium 

Mollicute specific lipoprotein, MG307 homolog, from M. 
genitalium, 

Mollicute specific lipoprotein, MG307 homolog, from M. 
genitalium 

13508175 876848 Mollicute specific lipoprotein, MG307 homolog, from M. 

genitalium 

involved in cytadherence 
similar to phosphate binding protein Psts 



13508178 877118 



13508176 876797 



13508106 
13508350 



876953 
877112 

Mycobacterium tuberculosis H37 Rv 

15607496 886491 PPE 

15607445 886592 PPE 

15610644 888270 PE_PGRS 

15608588 886605 PE_PGRS 

15609627 887941 PE_PGRS 

15610643 888256 PEJPGRS 

15607718 887725 PE PGRS 
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15609054 


885362 


PPE 


15610486 


888113 


PPE 


15610483 


888120 


PPE 


15610479 


888033 


PPE 


15609771 


888573 


PEJPGRS 


15610648 


888306 


PEJPGRS 


1561.0481 


888114 


PE_PGRS 


15608117 


885264 


PE_PGRS 


15607973 


885391 


PE_PGRS 


15608231 


885258 


PE_PGRS 


15608906 


885429 


PE_PGRS 


15608891 


885544 


PPE 


15609990 


888171 


PE_PGRS 


15609055 


885506 


PPE 


15608227 


887094 


PE_PGRS 


15610524 


888151 


PE_PGRS 


15609490 


886003 


PPE 


15607886 


888664 


PE_PGRS 


15609624 


887909 


PE_PGRS 


15607420 


886621 


PE_PGRS 


15608897 " 


885325 


PE_PGRS(wag22) 


15608590 


886595 


PE_PGRS 


15609728 


887992 


PE_PGRS 


15608012 


885742 


PE_PGRS 


15608534 


886745 


PE_PGRS 


15608940 


885730 


PE_PGRS 


1 DOU /oof 


oooODZ 




15609235 


888312 


PE_PGRS 


15610694 


887822 


PPE 


15609533 


885517 


PE_PGRS 


15610480 




PE PGRS 



Rickettsia prowazeldi strain Madrid E 
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15604316 


883411 


CELL SURFACE ANTIGEN (sca3) 


15604546 


883694 


CELL SURFACE ANTIGEN (sca5) 


Porphyromonas gingivalis W83 


34541453 


2551934 


hemagglutinin protein HagA 


34540040 


2551409 


lipoprotein, putative 


34540364 


2552375 


extracellular protease, putative 


34541613 


2552074 


hemagglutinin protein HagE 


34540183 


2551891 


internalin-related protein 


Shigella flexneri 2a str. 2457T 


30065424 


1080663 


minor fimbrial subunit, D-mannose specific adhesin 


30062726 


1077662 


putative adhesion and penetration protein 


30063758 


1078834 


putative fimbrial-like protein 


30065431 


1080671 


major type 1 subunit fimbrin (pilin) 


30063366 


1078379 


flagellar protein FliD 


30064308 


1079668 


outer membrane fluffing protein 


30062613 


1077555 


flagellar hook protein FlgE 


30061954 


1076843 


conserved hypothetical lipoprotein 


30065173 


1080393 


putative lipase 


30065425 


1080664 


minor fimbrial subunit, precursor polypeptide 


30064485 


1079637 


putative fimbrial protein 


30062615 


1077558 


flagellar basal body L-ring protein FlgH 


30064307 


1079452 


outer membrane fluffing protein 


30065601 


1080859 


putative glycoprotein/receptor 


30062118 


1077025 


putative fimbrial-like protein 


30064099 


1079223 


lipoprotein 


30062616 


1077559 


flagellar basal body P-ring protein Flgl 


30063546 


1078596 


putative fimbrial-like protein 


30062940 


1077910 


putative outer membrane protein 


30065426 


1080665 


minor fimbrial subunit, precursor polypeptide 


30062779 


1077721 


putative outer membrane protein 


30064194 


1079329 


putative lipoprotein 


30063365 


1078378 


flagellin 
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30062298 

30064968 

30061858 

30062178 

30062479 

30062565 

30063880 

30064531 

30065033 

Streptococcus 

24378550 

24379087 
24380463 
24379075 
24378955 
24379801 
24379528 
24379231 
24380488 

24380291 
24379342 
24380047 



24378698 1029755 



24378708 
24379427 
24379272 
24379641 



1077222 outer membrane protein X 

1 080 1 75 putative major fimbrial subunit 

1076740 outer membrane pore protein E (E,Ic,NmpAB) 

1 0 804 1 0 minor lipoprotein 

1 0774 1 2 putative fimbrial-like protein 

1 077506 minor curlin subunit precursor 

1078972 putative outer membrane lipoprotein 

1079686 cytoplasmic membrane protein 

1080243 putative receptor protein 

mutans UA159 

1029610 putative secreted antigen GbpB/SagA; putative peptidoglycan 
hydrolase 

cell surface antigen SpaP 
putative membrane protein 
penicillin-binding protein 2b 

penicillin-binding protein la; membrane carboxypeptidase 
glucan-binding protein C, GbpC 
hypothetical protein; possible cell wall protein, WapE 
putative glucan-binding protein D; BglB-like protein 
conserved hypothetical protein; possible transmembrane 
protein 

putative amino acid binding protein 

putative penicillin-binding protein, class C; fmt-like protein 
putative ABC transporter, branched chain amino acid-binding 
protein 

putative ABC transporter, metal binding lipoprotein; surface 
adhesin precursor; saliva-binding protein; lipoprotein receptor 
Lral (Lral family) 
1 029768 putative transfer protein 
1 02833 1 cell wall-associated protein precursor WapA 
1 028 1 96 putative amino acid transporter, amino acid-binding protein 
1 0285 1 1 putative ABC transporter, amino acid binding protein 



1028055 
1029310 
1028046 
1027967 
1028662 
1029536 
1028158 
1029325 

1029139 
1028247 
1028904 
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Streptococcus pneumoniae R6 



15902395 


934801 


Choline-binding protein 


15902381 


934810 


Choline-binding protein F 


15902165 


932894 


Surface protein pspA precursor 


15904047 


934859 


Choline binding protein D 


15904036 


933487 


Choline binding protein A 


15903986 


933069 


Choline-binding protein 


15903796 


933669 


Autolysin (N-acetylmuramoyl-L-alanine amidase) 


Neisseria meningitidis 


Z2491 


15794121 


907145 


putative membrane protein 


15794144 


907168 


putative surface fibril protein 


15793284 


906275 


truncated pilin 


15793460 


906456 


IgA-specific serine endopeptidase 


15793282 


906273 


fimbrial protein precursor (pilin) 

XT XT \x S 


15793337 


906332 


adhesin 


15793253 


906243 


putative lipoprotein 


15794356 


907848 


putative lipoprotein 


15793684 


906699 


putative membrane protein 


15793290 


906281 


truncated pilin 


15793283 


906274 


truncated pilin 


15793475 


906471 


haemoglobin-haptoglobin-utilization protein 


15793406 


906401 


porin, major outer membrane protein P.I 


15794985 


907333 


adhesin MafA2 


15794344 


907836 


putative lipoprotein 


15794622 


908118 


hypothetical outer membrane protein 


15793599 


906604 


pilus-associated protein 


15793763 


906779 


putative periplasmic binding protein 


Streptococcus pyogenes MGAS8232 


19745214 


995235 


putative secreted protein 


19746570 


994224 


putative penicillin-binding protein la 


19745593 


994771 


putative 42 kDa protein 


19745813 


993958 


putative adhesion protein 
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19745225 


994839 


putative choline binding protein 


19745828 


995250 


streptolysin S associated protein 


19746229 


995021 


putative minor tail protein 


19746909 


994105 


putative laminin adhesion 


19745560 


995061 


putative cell envelope proteinase 


Treponema 


pallidum subsp. pallidum str. Nichols 


15639714 


2611034 


flagellar hook protein (flgE) 


15639609 


2611657 


tpr protein J (tprJ) 


15639111 


2610909 


tpr protein C (tprC) 


15639125 


2610968 


tpr protein D (tprD) 



SARS coronavirus 

31581505 

32187357 

32187342 

30698329 

30421454 

30027620 

29836496 1489668 

30795145 
31416295 
30023954 

30275669 
29837498 

29837501 
29837503 

29837502 



spike protein S [SARS coronavirus Frankfurt 1] 

spike protein S [SARS coronavirus HSR 1] 

spike glycoprotein [SARS coronavirus ZJ01] 

putative spike glycoprotein S [SARS coronavirus TW1] 

putative spike glycoprotein [SARS coronavirus CUHK-SulO] 

S protein [SARS coronavirus Urbani] 

E2 glycoprotein precursor; putative spike glycoprotein [SARJS 
coronavirus] 

spike glycoprotein [SARS coronavirus Tor2] 

spike glycoprotein S [SARS coronavirus GD01] 

putative E2 glycoprotein precursor [SARS coronavirus 

CUHK-W1] 

spike glycoprotein S [SARS coronavirus BJ01] 

3C-like proteinase nsp5-ppla/pplab (3CL-PRO) [SAR^S 

coronavirus] 

putative nsp8-ppla/pp lab [SARS coronavirus] 
putative nsplO-ppla/pplab; formerly known as growth-factor- 
like protein [SARS coronavirus] 
putative nsp9-ppla/pp lab [SARS coronavirus] 
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Table 6: Hypothetical proteins predicted as putative adhesins by SPAAN in the 

genomes listed in table 2 

(Total number of proteins = 105) 

Protein GI Gene ID 

number 

Escherichia coli 0157:H7 
13363955 915578 
13360000 914929 
13362244 912369 
13359999 914888 
13361583 917316 
13361172 913156 
13361131 913207 
13359780 914422 
13360571 912499 
13362197 912893 
13362260 912399 
13360947 913505 
13361464 917196 
13361635 917367 
13362421 916655 
13361463 917195 
Haemophilus influenzae Rd 
16272115 951058 
30995442 950581 
Helicobacter pylori J99 
4155526 889586 
4155712 889748 
4155632 889684 
4156035 889468 
4155499 
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Mycoplasma pneumoniae 

13507870 877230 
13508239 877245 
13508109 876868 
13508025 877084 
13507838 876784 
13507883 877183 

13507871 877239 
13507944 877056 
13508241 876750 
13507942 877055 
13507840 877387 
13507867 877242 
13508201 877044 
13507941 876985 
13508114 877397 
Mycobacterium tuberculosis H37Rv 
15611014 886198 

15610173 887320 

15609513 885515 

15608094 885411 

15610958 886155 

15607528 886436 

15607678 887473 

15609587 885760 

15610708 887227 

15609526 885246 

15611033 886225 

15609028 885094 

15607730 887771 

15609121 885813 

15608255 885951 
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15608409 887039 
15609124 885815 
15607734 887797 
Rickettsia prowazekii strain Madrid E 
15604649 883964 
15604322 883472 
15604659 883996 
15604417 883217 
Porphyromonas gingivalis W83 
34540233 2551594 
Shigella flexneri 2a str. 2457T 
30062687 1077638 
30062956 1080449 
30063681 1078754 
30065435 1080675 
30063891 1078983 
30063211 1078195 
30065233 1080463 
30064387 1079531 
30062638 1077590 
30065236 1080466 
30061839 1076721 
Streptococcus mutans UA159 
24378864 1029452 
24380475 1029319 
24380237 1029088 
24379203 1028139 
24380480 1029320 
24379275 1029489 
24379291 1028216 
24379295 1028215 
24379804 1028663 
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24379162 1029417 
24378987 1029363 
24379179 1028118 
24379166 1028107 
24378827 1029444 
24380216 1029067 
Streptococcus pneumoniae R6 
15902140 932867 
15903446 934616 
15903916 934001 
15903848 933609 
15902832 934332 
15902372 934804 
15902152 932889 
Neisseria meningitidis Z2491 
15793668 906680 
15794714 907603 
Streptococcus pyogenes MGAS8232 

19747011 993608 
19747024 994165 

19747012 994373 
19746396 995057 
19746651 993824 
19745883 995045 
19745912 994077 

Treponema pallidum subsp. pallidum str. Nichols 
15639844 2611061 
15639720 2611059 

Table 7: The list of 198 adhesins found in bacteria 
PapG (E. coli) 

12837502 
7407201 
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7407207 

7407205 

147096 

4240529 

7407203 

42308 

7443327 

78746 

18265934 

26111419 

26250987 

26109826 

26249418 

13506767 

42301 

78745 

129622 

147092 

13506906 

7407209 

147080 

281926 

7407199 

147100 

78744 

SfaS (E.coli) 

477910 

264035 

42959 

134449 

96425 



FimH (E.coli) 
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26251208 

26111640 

5524634 

29422425 

5524630 

29422435 

29422415 

10946257 

29422419 

11120564 

29422457 

11120562 

29422459 

5524632 

29422455 

29422453 

29422451 

29422449 

29422447 

29422445 

29422443 

29422437 

29422433 

29422431 

29422429 

29422427 

29422423 

29422421 

29422417 

729494 

1361011 

1790775 
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3599571 

29422441 

12620398 

29422439 

5524628 

1787779 

1742472 

1742463 

15801636 

25321294 

12515169 

1 1 120566 

24051859 

24112911 

13360484 

15800801 

15830279 

25392018 

25500156 

12514120 

1787173 

16128908 

16501811 

16759519 

24051219 

24112354 

30040724 

30062478 

6650093 

5524636 

1778448 

Intimin (E.coli) 
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17384659 

4388530 

1389879 

15723931 

4323336 

4323338 

4323340 

4323342 

4323344 

4323346 

4323348 

4689314 

PrsG (E.coli) 

42523 

42529 

7443328 

7443329 

1172645 

HMW1 (Nontypeable H. influenzae) 

282097 

HMW2 (Nontypeable H. influenzae) 

5929966 

Hia (Nontypeable H. influenzae) 

25359682 
25359489 
25359709 
25359628 
25359414 
25359389 
21536216 
25359445 

HifE (H. influenzae) 
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MrkD (K. pneumoniae) 
FHA (B. pertussis) 
Pertactin (B. pertussis) 
YadA (Y. enterocolitica) 

SpaP (S. mutans) 



13506868 

13506870 

13506872 

13506874 

13506876 

3688787 

3688790 

3688793 

2126301 

1170264 

1170265 

533127 

535169 

3025668 

3025670 

3025672 

3025674 

642038 

127307 

17154501 

33571840 

10955604 

4324391 

28372996 

23630568 

32470319 



26007028 
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PAc (S. mutans) 

SspA (Streptococcus gordonii) 



CsliA (Streptococcus gordonii) 
CshB (Streptococcus gordonii) 
ScaA (Streptococcus gordonii) 
SspB (Streptococcus gordonii) 



SpaA (Streptococcus sobrinus) 
PAg (Streptococcus sobrinus) 



Protein F (Streptococcus pyogenes) 
PsaA (Streptococcus pneumoniae) 



CbpA e / SpsA / PbcA/ PspC 
(Streptococcus pneumoniae) 
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47267 

129552 

25990270 
1100971 

457707 

18389220 

310633 

25055226 
3220006 

546643 

217036 
47561 

19224134 

18252614 

7920456 

7920458 

7920460 

7920462 



14718654 
2425109 
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FimA (Streptococcus parasanguis) 
SsaB (Streptococcus sanguis) 
EfaA (Enterococcus faecalis) 
FnbA (Staphylococcus aureus) 
FnbB (Staphylococcus aureus) 



BabA (Helicobacter pylori) 
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2576331 
2576333 
3153898 
9845483 
19548141 

97883 

97882 

493017 

120457 

581562 

21205592 

13702452 

13309962 
13309964 
13309966 
13309968 
13309970 
13309972 
13309974 
13309976 
13309978 
13309980 
13309982 
13309984 
13309986 
13309988 
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13309990 
13309992 
13309994 

Advantages: 

1. The method helps in discovering putative adhesins, which are of great 
importance in drug discoveries and preventive therapeutics. 

2. The method is useful in predicting the adhesive nature of even unique proteins, 
5 because it is independent of the homology of the query proteins with other 

proteins. 

3. This method is easy to use. For calculating the output, only the amino acid 
sequence is required as input. No other information is required to get the 
information about its adhesive nature. 
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Claims 

1. A computational method for identifying adhesin and adhesin-like proteins, said 
method comprising steps of: 

a. computing the sequence-based attributes of protein sequences using five 
5 attribute modules of a neural network software, wherein the attributes 

are, (i) amino acid frequencies, (ii) multiplet frequency, (iii) dipeptide 
frequencies, (iv) charge composition, and (v) hydrophobic composition, 

b. training a artificial neural Network (ANN) for each of the computed five 
attributes, and 

10 c - identifying the adhesin and adhesin-like proteins having probability of 

being an adhesin (P ad ) as > 0.5 1 . 

2. A method as claimed in claim 1, wherein the protein sequences are obtained 
from pathogens, eukaryotes, and multicellular organisms. 

3. A method as claimed in claim 1, wherein the protein sequences are obtained 
15 from the pathogens selected from a group of organisms comprising Escherichia 

coli y Haemophilus influenzae, Helicobacter pylori, Mycoplasma pneumoniae, 
Mycobacterium tuberculosis, Rickettsiae prowazekii, Porphyromonas 
gingivalis, Shigella flexneri, Streptococcus mutans, Streptococcus pneumoniae, 
Neisseria meningitides, Streptococcus pyogenes, Treponema pallidum and 
20 Severe Acute Respiratory Syndrome associated human coronavirus (SARS ). 

4. A method as claimed in claim 1, wherein the method is a non-homology 
method. 

5. A method as claimed in claim 1, wherein the method uses 105 compositional 
properties of the sequences. 

25 6. A method as claimed in claim 1, wherein the method shows sensitivity of at 
least 90%. 

7. A method as claimed in claim 1, wherein the method shows specificity of 
100%. 

8. A method as claimed in claim 1, wherein the method helps identifies adhesins 
30 from distantly related organisms. 

9. A method as claimed in claim 1, wherein the neural network has multi-layer 
feed forward topology, consisting of an input layer, one hidden layer, and an 
output layer. 
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A method as claimed in claim 9, wherein the number of neurons in the input 
layer are equal to the number of input data points for each attribute. 
A method as claimed in claim 1, wherein the "P a d" is a weighted linear sum of 
the probabilities from five computed attributes. 

A method as claimed in claim 1, wherein each trained network assigns a 
probability value of being an adhesin for the protein sequence. 
A computer system for performing the method of claim 1, said system 
comprising a central processing unit, executing SPAAN program, giving 
probabilities based on different attributes using Artificial Neural Network and in 
built other programs of assessing attributes, all stored in a memory device 
accessed by CPU, a display on which the central processing unit displays the 
screens of the above mentioned programs in response to user inputs; and a user 
interface device. 

A set of 274 annotated genes encoding adhesin and adhesin-like proteins, 
having SEQ ID Nos. 385 to 658. 

A set of 105 hypothetical genes encoding adhesin and adhesin-like proteins, 
having SEQ ID Nos. 659 to 763. 

A set of 279 annotated adhesin and adhesin-like proteins of SEQ ID Nos. 1 to 
279. 

A set of 105 hypothetical adhesin and adhesin-like proteins of SEQ ID Nos. 280 
to 384. 

A fully connected multilayer feed forward Artificial Neural Network based on 
the computational method as claimed in claim 1 , comprising of an input layer, a 
hidden layer and an output layer which are connected in the said sequence, 
wherein each neuron is a binary digit number and is connected to each neuron 
of the subsequent layer for identifying adhesin or adhesin like proteins, wherein 
the program steps comprise:- 

[a] feeding a protein sequence in FASTA format; 

[b] processing the sequence obtained in step [a] through the 5 modules 
named A, C, D, H and M, wherein attribute A represents an amino acid 
composition, attribute C represents a charge composition, attribute D 
represents a dipeptide composition of the 20 dipeptides [NG, RE, TN, 
NT, GT, TT, DE, ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI 
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and HR], attribute H represents a hydrophobic composition and attribute 
M represents amino acid frequencies in multiplets to quantify 5 types of 
compositional attributes of the said protein sequence to obtain numerical 
input vectors respectively for each of the said attributes wherein the sum 
of numerical input vectors is 105; 

[c] processing of the numerical input vectors obtained in step [b] by the 
input neuron layer to obtain signals, wherein the number of neurons is 
equal to the number of numerical input vectors for each attribute; 

[d] processing of signals obtained from step [c] by the hidden layer to obtain 
synaptic weighted signals, wherein the optimal number of neurons in the 
hidden layer was determined through experimentation for minimizing 
the error at the best epoch for each network individually; 

[e] delivering synaptic weighted signals obtained from step [d] to the output 
layer for assigning of a probability value for each protein sequence fed 
in step [a] as being an adhesin by each network module; and 

[f| using the individual probabilities obtained from step [e] for computing 
the final probability of a protein sequence being an adhesin denoted by 
the P a d value, which is a weighted average of the individual probabilities 
obtained from step [e] and the associated fraction of correlation which is 
a measure of the strength of the prediction. 
A network as claimed in claim 18, wherein the input neuron layer consists of a 
total of 105 neurons corresponding to 105 compositional properties. 
A network as claimed in claim 18, wherein the hidden layer comprises of 
neurons represented as 30 for amino acid frequencies, 28 for multiplet 
frequencies, 28 for dipeptide frequencies, 30 for charge composition and 30 for 
hydrophobic composition. 

A network as claimed in claim 18, wherein the output layer comprises of 
neurons to deliver the output values as probability value for each protein 
sequence. 
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The Neural Network architecture 
Figurel 
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Figure 3 (a) 
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Figure 3(c) 
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Applicat-ion Project 

<120> Title : 
<130> AppFileRef erence : 
<140> CurrentAppNumber : 
<141> CurrentFilingDate 

Sequence 



10 <213> Organi smName : Escherichia coli 0157:H7 
, <400> PreSeguenceString : 

MINLSKEATV GKALTPIAIL MMLSFPVASQ AAGLVIKNGT VYNANGVPW DINKPNGSGL 6 0 

SHNIWDNLNV DKNGWFNNS ANESSTSLAG NIQGNSNBTS GSAKVILNEV TSKNPSTING 12 0 

mmfv.vjjjk.t ij^.\t:,t.t vn~ggc ir-rj r^rLrx^rpn iQDDKi^crrs ^zr.Txzz.-:: 

15 LDNASPTEIL SRNVWNGKV SADELNWAG NNYVNAAGQV TGSVSATGSR NGYSVDVAKL 24 0 

GGMYANKISL VSTEKGVGVR NLGVIAGGVN GVS IDS KGNL LNSNAQIQSA STINLTTNGT 300 

LDNTTGTVTS VGTISBNTNK NTIVNTRAGKF ISTMGDIYVN SGTIDNTNGK LAAAGMLAVD 3 60 

TNNATIi I NS G KGSSVGIEAG LVALKTGTLN NSNGQIRGGY VGLESAALNN NNGDIQTTGD 420 

IAIISNGNVD NNKGLIRSST GHIVIGAAGS VNNGSTKTAD TGSSDSLGII ADTGVEIGAN 48 0 

20 NINNNGGQIA SNGNVSLSSY STIDDYAGKI LSNSKVTIKG SSLRNDTGGI SGKQGIEVAV 54 0 

GGSLTNNIGV ISSEEGDISL LANSVDNHGG FMMGQNITME S MS GVNNNTA LIVASKKLKI 600 

NARGSIENRD GNNFGNAYGL YFGMPQQTGG MVGKEGIELS GQNIYNNNSR LIAEDGPLTL 66 0 

QAQNTFDNTR ALVTSGADAS IQVGGTYYNN YATTWSAGNL D I DATTLQNS SSGTMIDNNA 72 0 

TGFIASDKNL SLEWNSLTN YGWISGKGDV DVTVNNGNLY MRNTIAAEKG LDIAALNGIE 780 

25 NWKDISAGGD LTMNTNRHVT NNSNSNMVGQ NIVINAVNDI NHRGNIVSDA DLNVT TKGNL 840 

YNYLYMVGYG DIALSANSVA NNNATIEATG DLIIDSKGNV GNNRGNLHAL NGVLSVKGNN 900 

LNNDNGEIRG YGDVTLALTG NYDSYKGSLT SETGDVTLTA NIVDNAYGLI AGENVSVDAK 960 

STIYNNTALI AANKKLVINA GGNLENRDGN NFLRNMGALF GITDNVGGIV GKEGVTLSAQ 102 0 

NVYNNNSSII AENGPLNLLS RGTLDNTRAL LSSGADAIIR AAGTFYNNYA TTYSAGNLDV 108 0 

30 YAASLNtfASD GRLEDNTATG VIASDKNLDL SVDNSVTNYG WIS GKGDVHF NVLKGTLYNR 1140 

NAIAADNALT INALNGVENF KDIVAGTALT IDTQKYVTNN SNSNMLGQTI AINAVNDINN 12 00 

RGNIVGDYSL GVKTTGNIYN YLNMIi S YGVA GVSANKVTNS GKDAVLGGFY GLALEANETD 1260 

NTGTIVGM 1268 
<212> Type : PRT 

35 <211> Lengtn : 1268 

SequenceName : SEQ ID 1 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MNTIHLRCLF RMNPLVWCLW ADVAAKLRSL KRYSVFTFQR MKFMNRTSPY YCRRSVLSLL 60 

ISALIYAPPG MAAFTPDVIG WNDETVDGS QRVDERGTTN NTHIINHGQQ NVYGGVSNGS 12 0 

45 LIESGGYQDV GRHNNYVGQS NNTTINGGRQ SIHDGGISTG TIIESGNQDV YKGGI SNGTT 180 

IKGGASRVEG GSANGTLIDG GSQIVKVQGH ADGTTINKSG SQDWQGSLA TNTTINGGRQ 240 

YVEQSTVETT T I KNGGEQRV YESRALDTTI EGGTQSLNSK STAKNTQIYS GGTQIIDNTS 3 00 

SSDVIEVYSG GVLDVSGGTA TNVTQHDGAI LKTNTNGTTV SGTNSEGAFS IHNHVADNVL 3 60 

L ENGGHLD IN AYGSANKTII KDKGTMSVLT NAKADATRI D NGGVMDVAGN ATNTI INGGT 42 0 

50 QNINNYGIAT GTNINSGTQN IKSGGKADTT IISSGSRQW EKDGTAIGSN ISAGGSLIVY 480 

TGGIAHGVNQ ETGSALVANT GAGTD I EGYN KLSHFTITGG EANYWLENT GELTWAKTS 540 

AKNTTIDAGG KLIVQKEAKT DSTRLNNGGV LEVQDGGEAK HVEQQSGGAL IASTTSGTLI 600 

EGTNS YGDAF YIRNSEAKNV VLENAGSLTV VTGSRAVDTI I NANGKMD V Y GKDVGTVLNS 660 

AGTQTIYASA TSDKANI KGG KQTV YGLAT E ANIESGEQIV DGGSTEKTHI NGGTQTVQNY 720 

55 GKAINTDIVS GLQQIMANGT AEGSIINGGS QIVNEGGLAE NSVLNDGGTL DVREKGSATG 780 

IQQSSQGALV ATTRATRVTG TRADGVAFS I EQGAANNILL ANGGVLTVES DTSSDKTQVN 84 0 

TGGRE I VKTK ATATGTTLTG GEQIVEGVAN ETTINDGGIQ TVS ANGEA I K TTINEGGTLT 900 

VNDNGKATD I VQNS GAALQT STANGIEISG THQYGTFSIS GNLATNMLLE NGGNLLVLAG 960 

TEARDSTVGK GGAMQNQGQD SATKVNSGGQ YTLGRSKDEF QALARAEDLQ VAGGTAIVYA 102 0 

60 GTLADASVSG ATGSLSLMTP RDNVTPVKLE GAIRITDSAT LTIGNGVDTT LADLTAASRG 108 0 

SVWLETSNNSC AGTSNCEYRV NSLLLNDGNV YLSAQTAAPA TTNGIYNTLT TNELSGSGNF 1140 

YLHTNVAGSR GDQLWNNNA TGNFKIFVQD TGVS PQSDDA MTLVKTGGGD ASFSLGNTGG 12 0 0 

FVDLGTYEYV LKSDGNSNWN LTNDVKPNPD PNPNPMPNPK PDPKPDPKPD PKPDPTPEPT 1260 

PTPVPEKRIT PSTAAVLNMA ATLPLVFDAE LNSIRERLNI MKASPHNNNV WGATYNTRNN 132 0 

65 VTTDAGAGFE QTLTGMTVGI DSPNDIPEGI ATLGAFMGYS HSHIGFDRGG HGSVGSYSLG 138 0 

GYASWEHESG FYLDGWKLN RFESNVAGKM SSGGAANGSY HSNGLGGHIE TGMRFTDGNW 1440 

NLTPYASLTG FTADNPEYHL SNGMESKSVD TRSIYRELGA TLSYNMRLGN GMEIEPWLKA 150 0 
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AVRKEFVDDN RVKVNNDGNF VNDLSGRRGI YQAGIKASFS STLSGHLGVG YSHGAGVESP 1560 
WNAVAGVNWS F 1571 
<212> Type : PRT 
<211> Length : 1571 
5 SequenceName : SEQ ID 2 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MAVKISGVLK DGTGKPVENC TIQLKARRNS ATVWNTVAS ENPDEAGRYS MDVEYGQYSV 60 

ILLVEGFPPS HAGTITVYED SQPGTLNDFL GAMTEDDVRP EALRRFELMV EEVARNASAV 12 0 

AQNTAA7* 1' KS ^CT^AOTtXftR?*! .AATHATDAAU SA^JiL'TSAG QAASSAQSAS SSAGTASTKA 130 

15 TEAS KS AAAA ESSKSAAATS AGAAKTSETN AAVSQQSAAT SASTATTKAS EAASSARDAS . 240 

AS KEAAKS S E TSAASSASSA AS S ATAAGNS AKAAKTSETN AKSSETAAEQ SASAAAGSKT 3 00 

AAALSASAAS TSAGQASASA TAAGKSAESA ASSASTATTK AGEATEQASA AASSASAAKT 3 60 

SETNAKASET SAESSKTAAA SSASSAASSA SSASASKDEA TRQASAAKSS ATTAS TKATE 42 0 

AAGSATAAAQ SKSTAESAAT RAETAAKRAE D I AS AVALED ASTTKKGIVQ LSSATNSTSE 4 80 

20 SLAATPKAVK AAYELANGKY TAQDATTAQK GIVQLSNATN STSEMLAATP KSVKAAYDLA 54 0 

NGKYTAQDAT TAQKGIVQLS SATNSASETL AATPKAVKAA NDNANGRVPS ARKVNGKALS 600 

SDITLTPKDI GTLNSTTMSF SGGAGWFKLA TVTMPQASSV VSITLIGGAG FNVGSPQQAG 660 

ISELVLRAGN GNPKGITGAL WQRTSTGFTN FAWVNTSGDT YDIYVAIGNY ATGVNIQWDY 72 0 

TSNASVTIHT S PAYS ANKPE GLTDGTVYSL YTPSEQFYPP GAPIPWPSDT VP S GYALMQG 78 0 

25 QTFDKSAYPK LAAAYPSGVI PDMRGWT I KG KPASGRAVLS QEQDGIKSHT HSASASSTDL 840 

GTKTTSSFDY GTKSTNNTGA HTHSVSGTAA SAGNHTHSVT GASAVSQWSQ NGSVHKWSA 900 

ASVNTSAAGA HTHSVSGTAA SAGAHAHTVG I GAHTHS VAI GSHGHTITVN AAGMAENTVK 960 

NIAFNYIVRL A 971 
<212> Type : PRT 

30 <211> Length : 971 

SequenceNarae : SEQ ID 3 
SequenceDescription : 

Sequence 
35 __. 

<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MKRVI TLFAV LLMGWSVNAW SFACKTANGT AIPIGGGSAN VYVNLAPAVN VGQNLWDLS 60 

TQIFCHNDYP ETITDYVTLQ RGSAYGGVLS NFS GTVKYS G SSYPFPTTSE TPRWYNSRT 12 0 

40 DKPWPVALYL TPVSSAGGVA IKAGSLIAVL I LRQTKNYNS DDFQFVWNIY ANNDVWPTG 18 0 

GCDVSARDVT VTLPDYPGSV PIPLTVYCAK SQNLGYYLSG TTADAGNS I F TNTASFSPAQ 240 

GVGVQLTRNG TIIPANNTVS LGAVGTSAVS LGL TANYART GGQVTAGNVQ SIIGVTFVYQ 3 00 

<212> Type : PRT 
45 <211> Length : 3 00 

SequenceNarae : SEQ ID 4 
SequenceDescription : 

Sequence 
50 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MGYVTGGLPM KNNRAWALIS GLILFSGTAP AADNLHFTGN LLGKSCTPVI NGNLLAEIHF 60 

PTIAASDLMQ RGQSDRVPLV FQLKDCKSTT AFNVKVTLMG TEDTDLPGFL SIDSSSSATG 12 0 

55 VGIGIETAGG AAVPINSTTG ASFPLNQGNN SVNFNAWLQT VNGRNVTSGD FTATMTVTFE 180 

YF 182 

<212> Type : PRT 

<211> Length : 182 

SequenceName : SEQ ID 5 
60 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia 
65 <4 00> PreSequenceString : 

MKRHLNTSYR LVWNHITGTL WASELARSR 
ETVNDGTLTN HDNQIVFGTA NGMTISTGLE 



coli 0157:H7 

GKRAGVAVAL SLAAVTSVPA LAADKWQAG 60 
LGPDSEENTG GQWIQNGGIA GNTTVTTNGR 120 
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QWL EGGTAS DTVIRDGGGQ 
GGLATGTIIN TGAEGGPDSD 
DQSVHGRALN TTLNGGYQYV 
GGEATAVTQN TGGALVTSTA 
5 TLVDDGGTLA VSAGGKATSV 
NGGS FTVNAG GQAGNTTVGH 
EIRFDNQTTP NAALSRAVAK 
GGQATGKTWL AFTNVGNSNL 
NRDSDEDWYL RSENAYRAEV 

10 GHLGHDNNGG IARGATPESS 
DGSRAGTVRD DAGSLGGYLN 
LETGLPFSIT DNLMLEPQLQ 
TFGEGTSSRD TLRDSAKHSV 
NGTSLDLQAG IjEARIRENIT 

15 <212> Type : PRT 

<211> Length : 949 

SequenceName : SEQ ID 6 
SequenceDescription : 

20 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

MKKWHYIFCI ILFHLGLPCG YAANDGTCAT RGGTHTLSLN FPLTTVSAAN NVPGNTLIDI 60 
25 ANATSSENYS VLCNCDSKHS NGAYHEIYYT ADPAPGMVYS TTASGLAFYY LNEYVDVGTK 12 0 

ISVLNAGYTA VPFEHVSNQA TTTDHTCQGN KTTAVGVSLK TGADAKISFR IKRSINGTW 180 
IPITDIALLY ANISSTTTRG EAIAKVRISG SLTAPQSCQI NAGQVIYFDF DTIPASEFSS 240 
TAGQAITSRK ITKTVSIECT GMGYERTQKV DAS FTGTNRS SDDTMVATDN ADVGIKIYNK 3 00 

SNAEVSVNNG KLPADMGNTT I FGRKNGS VT FSAAPASFTG ARPQPGVFNA TATLTIEFVN 3 60 

30 

<212> Type : PRT 
<211> Length : 360 

SequenceName : SEQ ID 7 

SequenceDescription : 

35 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

40 M S RYKTGHKQ PRFRYSVLAR CVAWANISVQ VLFPLAVTFT PVMAARAQHA VQPRLSMGNT 60 

TVTADNNVEK NVASFAANAG TFLSSQPDSD ATRNFI TGMA TAKANQEIQE WLGKYGTARV 120 

KLNVDKDFSL KDSSLEMLYP IYDTPTNMLF TQGAIHRTDD RTQSNIGFGW RHFSGNDWMA 180 

GVNTFIDHDL SRSHTRIGVG AEYWRDYLKL SANGYIRASG WKKSPDIEDY QERPANGWDI 240 

RAEGYLPAWP QLGASLMYEQ YYGDEVGLFG KDKRQKDPHA ISAEVTYTPV PLLTLSAGHK 3 00 

45 QGKSGENDTR FGLEVNYRIG EPLAKQLDTD SIRERRVLAG SRYDLVERNN NIVLEYRKSE 3 60 

VIRIALPERI EGKGGQTLSL GLWSKATHG LKNVQWEAPS LLAEGGKITG QGSQWQVTLP 420 

AYRPGKDNYY AISAVAYDNK GNTS KRVQTE WITGAGMSA DRTALTLDGQ SRIQMLANGN 480 

EQKPLVLSLR DAEGQPVTGM KDQIKTELTF KPAGNIVTRS LKATKSQAKP TLGEFTETEA 540 

GVYQSVFTTG TQSGEATITV S VDGMS KTVT AELRATMMDV ANSTLSANEP SGDWADGQQ 600 

50 AYTL TLTAVD SEGNPVTGEA SRLRFVPQDT NGVTVGAISE IKPGVYSAAV SSTRAGNWV 660 

RAFSEQYQLG TLQQTLKFVA GPLDAAHSSI TLNPDKPWG GTVTAIWTVK DAYDNPVTSL 720 

TPEAPSLAGA AAEGSTASGW TNNGDGTWTA QITLGSTAGE LEVMPKLNGQ NAAANAAKVT 780 

WADALSSNQ SKVSVAEDHV KAGESTTVTL VAKDAHGNA I SGLALSASLT GTASEGATVS 840 

SWTEKGNGSY VATLTTGGKT GELRVMPLFN GQPAATEAAQ LTVIAGEMSS ANSTLVADNK 900 

55 APTVKTTTEL TFTVKDAYGN PVTGLKPDAP VFSGAASTGS ERPSAGNWTE KGNGVYVSTL 960 

TLGSAAGQLS VMPRVNGQNA VAQPLVLNVA GDASKAEIRD MTVKVNNQLA NGQSANQITL 1020 

TWDTYGNPL QGQEVTLTLP QGVTSKTGNT VTTNAAGKAD I ELMSTVAGE HNISASVNGA 1080 

QKTVTVKFNA DAS TGQANL Q VDAAAQKVAN GKDAFTLTAN VEDKNGNPVP GSLVTFNLPR 1140 

GVKPLTGDNV WVKANDEGKA ELQWSVTAG T YE ITASAGN SQPSNTQTIT FVADKATATV 12 0 0 

60 SGIEVIGNYA LADGNAKQTY KVTVTDANNN LLKDSEVTLT ASPANLVLTP NGTAKTNEQG 12 60 

QAIFTATTTV AAKYTLTAKV SQADGQESTK TAESKFVADD TNAVLTASSD VTSLVADGIS 132 0 

TAKLEVTLMS ANNPVGGNMW VDIKTPEGVT EKDYQFLPSK NDHFVSGKIT RTFSTSKPGV 13 80 

YTFTFNALTY GGYEMKPVTV TITAVDADTA KGEEAMN 1417 
<212> Type : PRT 

65 <211> Length : 1417 

SequenceName : SEQ ID 8 
SequenceDescription : 



SLNGLAVNTT LNNRGEQWVH EGGVATGT 1 1 NRDGYQSVKS 180 

NSYTGQKVQG TAESTTINKN GRQIILFSGL ARDTL I YAGG 240 

HRDGLALNTV INEGGWQWK AGGAAGNTTI NQNGELRVHA 3 00 

ATVIGTNRLG NFTVENGKAD GWLESGGRL DVLESHSAQN 3 60 

TITSGGALIA DSGATVEGTN ASGKFSIDGT SGQASGLLLE 42 0 

RGTL TLAAGG SLSGRTQLSK GAS MVLNGDV VSTGDIVNAG 480 

SNSPVTFHKL TTTNLTGQGG TINMRVRLDG SNASDQLVIN 540 

GVATTGQGIR WDAQNGATT EEGAFALSRP LQAGAFNYTL 600 

PLYTSMLTQA MDYDRI LAGS RSHQTGVNGE NNSVRLSIQG 660 

GSYGFVRLEG DLLRTEVAGM SLTTGVYGAA GHSSVDVKDD 72 0 

LVHTSSGLWA D I VAQGTRHS MKAS SDNNDF RARGWGWLGS 780 

YTWQGLSLDD GQDNAGYVKF GHGSAQHVRA GFRLGSHNDM 840 

SELPVNWWVQ PSVIRTFSSR GDMS MGTAAA GSNMTFSPSR 900 

LGVQAGYAHS VCGCSAEGYN GQATLNMTF -v " £49 
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Sequence 



<213> Organi smName : Escherichia coli 0157 :H7 
5 <400> PreSequenceString : 

MARGWASSEA S GAMTDWLNN FGTARISLGV DEDFSLKNSQ FDFLHPWYDT PDYLLFSQHT 60 

LHRTDDRTQI NTGLGWRHFT SSWMSGINLF FDHDLSRYHS RAGLGAEYWR DYLKLSSNAY 12 0 

IGLTGWRSAP ELDNDFEARP ANGWDLRAEG WLPAWPQLGG KLVYEQYYGD EVALFDKNDR 180 

QSNPHAITAG LNYTPFPLLT LSAEQRQGKQ GENDTRFAVD LTWQPSSSMQ KQLNPDEVAG 240 

10 RRSLAGSRYD LIDRNNNIVL EYRKKELIRL SLLDPVKGKS GEIKPLVSSL QTKYALKGYN 300 

IEAAALEAAG GKVSTSGKDI TVTLPGYRFT NTPETDNTWS IDVTAEDVKG NIjSRHEQSMV 3 60 

VIQAPTLSQK DSLLSVNPLT VAADKKSTTT LTVTAHDSDG TPVPGLALQT RSEGVQDITL 420 

SDWTDNGDGS YTQILTAGTT SGSVTLTPQI NGESAVKESI WNIVPWSS RDHSSITIDN 48 0 

VSYYAGDDIK VTvVELKDDSN QPVAYQKEEL VKAVTVEIJSK PGATIVWHEE QPGVYAANYP SIC 

15 AYKQGTALiRA QLSLHNWNAP LQSHIYNIEA NQNKARVATL SATNNDVYAD KKTFNTLTIN 60 0 

VTDESDNPLT NHQVTFKNEK GSAEFVEPPQ QNTDAYGVAT INMVSQVAEE NTISATLPNG 660 

FSQRIIAKFV SDSSTPKFKQ LVADPDTI I A GNSQGSTLTA IITDFHNNPL KDMKVNFVAP 72 0 

GGSQLDNTTA TTDQSGIVRV HLTSSKAGSY SVDASLEVDK NIHQSVTITV VPNREQSVMT 780 

LNAGSGSAIA NNTNIVTLTA SVKDVYGHPL PDEDVKFTLP ASMTGNFTLS SETARTDANG 840 

20 DAWTLRGTK AGE F TVTATL TRNNTVAYQQ VTFIGDTNSA QLQPLTASLN SIVAGNSTGS 900 

TLTATI LDAY QNPLKDQLVT FQSNDVTLSE TEVTTNTLGQ ATVTMTSNIA GQHNVWSRK 960 

AQASDNKTFS LSVLPDESSA KVISITGAEK TITVGENITL RILVQDAFNN VIAGQRVRLS 1020 

AQPTTNITIG DTAYTDNNGY AYVNLLSTQP GVYQVTATLD NNSSSKVDVN VANGKLELTS 1080 

SKPETTVHNS EGITLTATAR NARGELMPGQ IITFSVTPEG ATLSNTGEVL TDQSGQAKVT 1140 

25 LTSDKVNVYT VTAI MGKDVP VQSQVTVAVK ADAKTAHWS WASPDTITA DGIDSSTITS 12 00 

RVEDDYGFPV EGVDISHGLD TKGSPWNIP TTRTDQSGQV TATITSTLAE TLTVNVQVPG 1260 

TANQSATITL VAGTADESKS ILKSDVDTLK ADYQQSAKLT LTLQDKYGNP IVTSDHLEFV 1320 

QSGPFVNFLK LSDIDYSQRN YGEYTVTVTG GKEGTATLIP MLNGVHQANL SISLNLIQSI 1380 

KEMSGHVTAN NHTFSTAKFP SEGFAGAYYT LNNDNFEAGK TVDDYMFSSS QGWVSVDASG 1440 

30 KVSFANIGDQ TSVTISAVPR QGGTTYQTLI KLKGWWVNNG NHTNIWLAAN ALCHAKNDGY 1500 

NLPGITHLTS GENKRTQGSL YGEWGNVGAF SSNSQFTPGA YWTSESDDYS RHYYVQMLTG 1560 

MTGSDADSSP QLTACRKSL 1579 
<212> Type : PRT 
<211> Length : 1579 

35 SequenceName : SEQ ID 9 

SequenceDescription : 

Sequence 



40 <213> OrganisraName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MI THGCYTRT RHKHKLKKTL IMLSAGLGLF F YVNQNS FAN GENYFKLGSD SKLLTHDSYQ 60 

NRLFYTLKTG ETVADLSKSQ DINLSTIWSL NKHLYSSESE MMKAAPGQQI ILPLKKLPFE 12 0 

YSALPLLGSA PLVAAGGVAG HTNKLTKMSP DVTKSNMTDD KALNYAAQQA ASLGSQLQSR 18 0 

45 SLNGDYAKDT ALGIAGNQAS SQLQAWLQHY GTAEVNLQSG NNFDGSSLDF LLPFYDSEKM 240 

LAFGQVGARY IDSRFTANLG AGQRFFLPAN MLGYNVFIDQ DFSGDNTRLG IGGEYWRDYF 3 00 

KSSVNGYFRM SGWHESYNKK DYDERPANGF DIRFNGYLPS YPALGAKLIY EQYYGDNVAL 360 

FNSDKLQSNP GAATVGVNYT PIPLVTMGID YRHGTGNEND LLYSMQFRYQ FDKSWSQQIE 42 0 

PQYVNELRTL SGSRYDLVQR NNNIILEYKK QDILSLNIPH DINGTEHSTQ KIQLIVKSKY 480 

50 GLDRIVWDDS ALRSQGGQIQ HSGSQSAQDY QAILPAYVQG GSNIYKVTAR AYDRNGNSSN 540 

NVQLTITVLS NGQWDQVGV TDFTADKTSA KADNADTITY TATVKKNGVA QANVPVSFNI 600 

VSGTATLGAN SAKTDANGKA TVTLKSSTPG QVWSAKTAE MTSALNASAV IFFDQTKASI 660 

TE I KADKTTA VANGKDAIKY TVKVMKNGQP VNNQSVTFST NFGMFNGKSQ TQATTGNDGR 720 

ATITLTSSSA GKATVSATVS DGAEVKATEV TFFDELKIDN KVDIIGNNVR GELPNIWLQY 780 

55 GQFKLKASGG DGTYSWYSEN TSIATVDASG KVTLNGKGSV VIKATSGDKQ TVSYTIKAPS 840 

YMIKVDKQAY YADAMS I CKN LLPSTQTVLS DIYDSWGAAN KYSHYSSMNS ITAWIKQTSS 900 

EQRSGVSSTY NLITQNPLPG VNVNTPNVYA VCVE 934 
<212> Type : PRT 
<211> Length : 934 

60 SequenceName : SEQ ID 10 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MLVLSESFKN KLLPMNGYMK GGSDSGSKAQ ARATEKGIEL QREMWQTNMQ NLAPFTPLAQ 60 
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10 



QYVSQLQNLS SLQGQGQALN QYYNSQQYKD LAGQARYQSL AAAEATGGLG STATGNQLAA 12 0 

IAPTLGQNWL SGQMNNYNNL ANIGLGALTG QANAGQNYAN NVSQLYQQQA AASAANANKP 180 
SGLQSFATGA I GGAAS GAM I GSAVPVIGTG IGALAGGVIG GLGSLF 226 
<212> Type : PRT 
<211> Length : 226 

SequenceMame : SEQ ID 11 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MKKILSGLIL LLCCPYGFAA NGDGATHMSN LSFGPLTVAA ANNHSGYNIF EALSNTTGTY 60 

PVRGHCDDTH GGPGQQTAFF FIFYTGDAAP GLVLERTLMG LOTYALNDYL SVGVTIFIIN " 1-2 0 

15 2STQYAAI PFEH LSNQSTSPQH TCGAGNNGST VNLDSGRSAK IiSFYVRHSIT GTVTI PTTEV 180 

AWLYAGMSDH FPKTTPVSKV TIRGQLTAPQ NCELTPNQSI DVDFQKINSA EFSSTAGSII 240 

AERKIKTEVT VSCTGMEDVR STEWSASMI AANRSADATM IVTSNPDVGI KIFDKNDRPV 3 00 

NVDGGNXj PAD MGAI SRLGKT DGSVTFYSAP ASLTGAKPAP DNGFTATATL VIEFTN 3 56 

20 <212> Type : PRT 

<211> Length : 356 

SequenceName : SEQ ID 12 
SequenceDescription : 

25 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<:4 00> PreSequenceString : 

MNKIYRLKWN RSRNCWSVCS ELGSRVKGKK SRAVLISAIS LYSSLVFADD VI VNQDKT I D 60 

30 FGKENQSIDY RITVTDNANL VINATDTSRP RLTLASGGGL DITGGKVTIN GPLNFLLKGT 12 0 

GFLNVSNAGS ELYADDLYES NSGMRHDRGY FNVSNGGKIH VKGTSRLTYL QGNVSGEGSQ 180 

VTsTSETFFMGV YGSYGGNQYL SVNNGGEVNA RKQISLGYYD QVSDTTLAVS EGGKI S APT I 240 

SLSTNSELAL GAQEGSAAKA AGIIDAEKIE FWAKTSEKK ITLNHTDKDA TISADIVSGS 3 00 

EGLGYINALN GTTYLTGDNS AFSGKVKIEQ NGALGITQNI GTAEINNRGK LHL KADDSMT 3 60 

35 FANK I S GNGT ISIDSGTVEL TGNNYAFSGY IDVASGAVAV ISEDKNIGRA ELDVDGKLQI 420 

MANKDWVFDN DLEGRGI VE I NMGNHEFSFD EFAYTDWFQG SLAFQNTTFN LEKNAEFLQK 48 0 

GGITAGQGSL VTVGKGAHS I STLGFSGGTV DFGALTAGAQ MTEGTVNVSK TLDLRGEGVI 540 

QVSDSDWRS VSRDIDSALS LTEVDDGNST I KLVDAQGAE VLGDAGNLQL QDKNGQILSS 600 

SAQRDIQQMG QKAAVGTYDY RLTSGVNNDG LYIGYGLTQL DLHATD SDAL VLSSNGKSEN 660 

40 AADL S AKITG SGDLAFSSQK GQTVSLSNKD NDYTGVTDLR SGTLLLNNDN VLGNTHELRL 720 

AAETELDMNG HSQTVGTLNG SADSLLSLNG GSLTVTNGGT STGSLTGSGE LNIQGGTLDI 780 

AGDNSNLTAN VNIANSANVL VSHAQGLGSA NVENNGTLAL NNSAEKRAAA SVNYALGGNL 840 

TNNGTLMTGM SGQQAGNVLV VKGNYHGNNG QLVMNTVLNG DDSVTDKLW EGDTS GTTAV 900 

TVNNAGGTGA KTLNGIELIH VDGKSEGEFV QAGRIVAGAY DYTLARGQGA NSGNWYLTSG 960 

45 SDSPELQPEP DPMPNPEPNP NPEPNPNPTP TPGPDLNVDN DLRPEAGSYI ANLAAANTMF 1020 

XTRLHERLGN" TYYTDMVTGE QKQTTMWMRH EGGHNKWRDG SGQLKTQSNR YVLQLGGDVA 1080 

QWSQNGSDRW HVGVMAGYGN SDSKTISSRT GYRAKASVNG YSTGLYATWY ADDESRNGAY 1140 

LDSWAQYSWF DMTVKGDDLQ SESYKSKGFT ASLEAGYKHK LAEFNGSQGT RNEWYVQPQA 12 0 0 

QVTWMGVKAD KHRESNGTLV HSNGDGNVQT RLGVKTWLKS HHKMDDGKSR EFQPFVEVNW 1260 

50 L.HNSKDFSTS MDGVSVTQDG ARNIAEIKTG VEGQLNANLN VWGNVGVQVA DRGYNDTSAM 132 0 

VGIKWQF 13 27 
<212> Type : PRT 
<211> Length : 1327 

SequenceMame : SEQ ID 13 

55 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

60 <400> PreSequenceString : 

M I TMKKS VLT AFITWCATS S VMAADDNA I TDGSVTFNGK VIAPACTLVA ATKDSWTLP 60 
DVSATKLQTN GQVSGVQTDV PIELKDCDTT VTKNATFTFN GTADTTQ I TA FANQASSDAA 120 
TNVALQMYMN DGTTAI KPDT ETGNILLQDG DQTLTFKVDY I ATGKATS GN VNAVTNFHIN 180 
¥:Y 182 

65 <212> Type : PRT 

<211> Length : 182 

SequenceName : SEQ ID 14 
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SequenceDescription : 
Sequence 



5 <213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MSKFVKTAIA ATMVMGAFAS TSTIAAGNNG TARFYGTIED SPCSIVPDDH KLEVDMGDIG 60 

SGILKNNGTS TPKAFQIHLQ DCVFDTQTTM TTTFTGNASS TNSGNYYTIY NTDTGAAFNN 12 0 

VS IiAI GDAQG TSYKSGAGIE QKIVNDTATN KGKAKQTLDF KAWLVGAADA PDLGNFEANT 180 

10 TFQITYL 187 



<212> Type : PRT 

<211> Length : 187 

SequenceName : SEQ ID 15 
SequenceDescription : 

15 

Sequence 



<213> OrganismName : Escherichia coli G157:H7 
<400> PreSequenceString : 

20 MRVIFLRKEY LSLLPSMIAS LFSANGVAAA IDLCQGYDIK ASCHASRQSL SGITQVWSIA 60 

DGQWLVFSDM TNNAS GGAVF LQQGAEFTLS PENETGMTLF ANN TVS GE YN NGGAI FAKEN 120 

STLNLTDVIF SGNVAGGYGG AIYSSGTNDT GAIDLRVTNA VFRNNIANDG KGGAIYTINN 180 

DIYLSDDVFN NNQAYTSTSY SDGDGGAIDV TDNNSDSKHP SGYTIINNTA FTNNTAEGYG 240 

GAIYTNSATA PYLIDISVDD SYS QNGGVLV DENNSAAGYG DGPSSAAGGF MYLGLSEVTF 3 00 

25 DIADGKTLVI GNTENDGAVD SIAGTGLITK TGS GDIiVLNA DNNDFTGEMQ I ENGE VTLGR 3 60 

SNSLMNVGDT HCQDDPQDCY GLTIGSIDKY QNQAELNVGS TQQTFAHSLT GFQNGTLNID 420 

AGGNVTVNQG SFAGTIEGAG QLTIAQNGSY VLAGAQSMAL TGDIWDAGA VLSLEGDAAD 480 

XiAALQDDPQS IVLNGGMLDL SDFSTWQSGT SYKDGLEVSG SSGTVIGSQD WDLAGGNDM 540 

HIGGDGKDGV YWIDAGDGQ VSLANDNQYL GTTQIASGTL MVSDNSQLGY THYNRQVIFT 60 0 

30 DKPQESVMEI TANVDTRSTT TEHGRDIEMR ADGEVAVDAG VDTQWGALMA DSSGQHQDEG 660 

STLTKTGAGT LELTASGTTQ SAVRVEEGTL QGDVADIFPY ASSLWVGDGA TFVTGADQDI 720 

QSIDATSSGT IDISDGTVLR LTGQDTSVAL NASLFNCDGT LVNATDGVTL TGELNTNLET 780 

DSLTYLSNVT VNGNLTNTSG AVSLQNGVAG DTLTVNGDYT GGGTLLLDSE LNGDDSVSDQ 840 

LVMNGNTAGN TTVWNSITG IGEPTSTGIK WDFAADPTQ FQNNAQFSLA GSGYVNMGAY 90 0 

35 DYTLVEDNND WYLRSQEVTP PSPPDPDPTP DPDPTQDPDP TPDPEPTPAY QPVTiNAKVGG 960 

YLNNLRAANQ AFMMERRDHA GGDGQTLNLR VIGGDYHYTA AGQLAQHEDT STVQLSGDLF 1020 

SGRWGTDGEW MLGIVGGYSD NQGDSRSSMT GTRADNQNHG YAVGLTSSWF QHGKQKQGAW 10 80 

LDNWLQYAWF SNDVSEHEDG VDHYHSSGII ASLEAGYQWL PGRGWIEPQ AQVIYQGVQQ 1140 

DDFTAANRAR VSQSQGDDIQ TRIiGLHSEWR TAVHVIPTLD LNYYHDPHST EIEEDASTIS 12 00 

40 DDAVKQRGEI KVGVTGNISQ RVSLRGSVAW QKGSDDFAQT AGFL SMTVKW 125 0 



<212> Type : PRT 

<211> Length : 1250 

SequenceName : SEQ ID 16 
SequenceDescription : 

45 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

50 MHSWKKKLIV SQLALACTLA ITSQANAATN DISGQTYNTF HHYNDATYAD DVYYDGYVGW 60 

NNYAADSYYN GDIYPVINNA TVNGVISTYY LDDGISTNTN ANSLTIKNST IHGMITSECM 120 

TTDCADDRAT GYVYDRLTLS VDNSTIDDNY EHYTYNGTYN NAADTHWDV YDMGTAITLD 180 

QEVDLSITNN SHVAGITLTQ GYEWEDIDDN TVSTGVNSSE VFNNTITVKD STVTSGSWTD 240 

EGTTGWFGHT GNASNYSNTL TADDVAIAAI ANPYADNAMQ TTVTLDNSTL MGDWFSSNF 3 00 

55 DENFFPQGAN SYRDADGDVD TNGWDGTDRM DVTLNNGSKW VGAAMSVHMV DEDGDGSYDG 3 60 

YAVGTEATAT LLDIAANSLW PSSTVGVDNI NTQYDENGHI VGNEVYQSGL FNVTLNGGSE 420 

WDTTKSSLID TLSINSGSQV NVADSRLISD TVSLTGGSNL NIGEDGHVAT NTLTIDNSTV 480 

KMSDDVSAGW GLEDAALYAN TITVTNDGLL DINVDQFDAN PFQADTLNLT STTDTNGNIH 540 

AGVFDIHSSD YVMDTDLVND RTNDTTKSNY GYGLIAMNSD GHLTINGNGD NDNTASIEAG 600 

60 QNEVDNNGDH VAAATGNYKV R I DNATGAGS I AD YNGNEL I YVNDKNSNAT FSAANKADLG 660 

AYTYQAEQRG NTWLQQMEL TDYANMAL S I PSANTNIWNL EQDTV GTRLT NSRHGLADNG 720 

GAWVS YFGGN FNGDNGT INY DQDVNGI MVG VDTKIDGNNA KWIVGAAAGF AKGDMNDRSG 7 80 

QVDQDSQTAY I YS S AHF ANN VFVDGSLSYS HFNNDL S ATM SNGTYVDGST NSDAWGFGLK 840 

AGYDFKLGDA GYVTPYGSIS GLFQSGDDYQ LSNDMKVDGQ SYDSMRYELG VDAGYTFTYS 9 00 

65 EDQALTPYFK LAYVYDDSNN DNDVNGDS ID NGTEGSAVRV GLGTQFSFTK NFSAYTDANY 9 60 

LGGGDVD QDW SANVGVKYTW 980 
<212> Type : PRT 
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<211> Length : 980 

SequenceName : SEQ ID 17 
SequenceDe script ion : 

Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MKLKHVGMIV VSVLAMSSAA VSAAEGDESV TTTVNGGVIH FKGEWNAAC AIDSESMNQT 60 
VELGQVRSSR LAKAGDLSSA VGFNIKLNDC DTNVSSNAAV AFLGTTVTSN DDTLALQSSA 120 
AGSAQINVGIQ ILDRTGEVLI LDGATFSAKT DLIDGTNILP FQARYIALGQ SVAGTANADA 180 
TFKVQYL 187 
<212> Type : PRT 
<211> Length i 18,7 

SequenceName : SEQ ID 18 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli Q157:H7 
<4 00> PreSequenceString : 

MKLLKVAAIA A I VF S GS ALA GWPQYGGGG GNHGGGGNNS GPNSELNIYQ YGGGNSALAL 60 
QADAR2STSDLT ITQHGGGNGA DVGQGSDDSS IDLTQRGFGN SATLDQWNGK DSHMTVKQFG 12 0 

GGNGAAVDQT ASNSTVNVTQ VGFGNNATAH QY . 152 

<212> Type : PRT 
<211> Length : 152 

SequenceName : SEQ ID 19 

SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli G157:H7 
<40 0> PreSequenceString : 

MP I GNXjGHNP NVNNSIPPAP PLPSQTDGAG GRGQLINSTG PLGSRALFTP VRNSMADSGD 60 
NRASDVPGLP VNPMRLAASE ITLNDGFEVL HDHGPLDTLN RQIGSSVFRV ETQEDGKHIA 120 
VGQRNGVETS WLSDQEYAR LQS IDPEGKD KFVFTGGRGG AGHAMVTVAS DITEARQRIL 180 
ELLEPKGTGE SKGAGESKGV GELRESNSGA ENTTETQTST STSSLRSDPK LWLAL GTVAT 240 
GIiIGLAATGI VQALALTPEP DSPTTTDPDA AASATETATR DQLTKEAFQN PDNQKVNIDE 3 00 

LGNAI PSGVL KDDWANIEE QAKAAGEEAK QQAIENNAQA QKKYDEQQAK RQEELKVSSG 3 60 

AGYGLSGALI LGGGIGVAVT AALHRKNQPV EQTTTTTTTT TTTSARTVEN KPANNTPAQG 420 
NVDTPGSEDT MESRRSSMAS TSSTFFDTSS I GTVQNPYAD VKTSLHDSQV PTSNSNTSVQ 480 
NM6NTDSWY STIQHPPRDT TDNGARLLGN PSAGIQSTYA RLALSGGLRH DMGGLTGGSN 540 
SAVNTSNNPP APGSHRFV 558 
<212> Type : PRT 
<211> Length : 558 

SequenceName : SEQ ID 20 

SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MFSTFKKAAL LAAIALPFST MAAPTVTFQG EVTDQTCSVN INGQTNSWL MPTVAMADFG 60 
AT LAD GQ SAG QTPFTVSVSN CQAPTGADQA INTTFLGYDV DAS TGVMGNR DTSSDAAKGF 120 
GIQLMDSSTS GNPVTLAGAT NVPGLTLKVG DTEASYDFGA RYFVIDSAAA TAGKI TAVAE 180 
YTLSYL 186 
<212> Type : PRT 
<211> Length : 186 

SequenceName : SEQ ID 21 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MNSEGGKPGN VLTVNGNYTG NNGLMTFNAT LGGDNS PTDK MNVKGDTQGN TRVRVDNIGG 60 
VGAQTVNG I E L I EVGGNS AG NFALTTGTVE AGAYVYTLAK GKGNDEKNWY LTSKWDGVTP 120 
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ADTPDPINNP PWDPEGPSV YRPEAGSYIS NIAAANSLFS HRLHDRLGEP QYTDSLHSQD 18 0 

SASSMWMRHV GGHERSSAGD GQLNTQANRY VLQLGGDLAQ C ' WSSNAQDRWH LGVMAGYANQ 240 

HSNTQSNRVG YKSDGRI SGY SAGLYATWYQ NDANKTGAYV DSWALYNWFD NSVSSDNRSA 3 00 

DDYDSRGVTA SVEGGYTFEA GTCSGSEGTL NTWYVQPQAQ ITWMGVKDSD HARKDGTRIE 3 60 

5 TEGDGNVQTR LGVKTYLNSH HQRDDGKQRE FQPYIEANWI NNSKVYAVKM NGQTVSRDGA 42 0 

RNLGEVRTGV EAKVNNNLSL WGNVGVQLGD KGYSDTQGML GVKYSW 466 
<212> Type : PRT 
<211> Length : 466 

SequenceName : SEQ ID 22 
10 SequenceDescription : 

Sequence 



35 



50 



65 



•-213 > OrganismName : Escherichia coli 0157;H7 ^ ■ 

15 <400> PreSequenceString : 

MSYLMLRLYQ RNTQCLHIRK HRLAGF FVRIi FVACAFAVQA PLSSAELYFN PRFIiADDPQA 60 

VADLSRFENG QELPPGTYRV DIYLNNGYMA TRDVTFNTGD SEQGIVPCLT RAQLASMGLN 12 0 

TASVAGMNLL ADDACVPKTT MVQDATAHLD VGQQRLNLTI PQAFMSNRAR GYIPPELWDP 180 

GINAGLXiNYN FSGNSVQNRI GGNSHYAYLN LQSGLNIGAW RLRDNTTOSY NSSDRSSGSK 240 

20 NKWQHINTWL ERDIIPLRSR LTLGDGYTQG DIFDGINFRG AQLASDDNML PDSQRGFAPV 3 00 

IHGIARGTAQ VTIKQNGYDI YNSTVPPGPF TINDIYAAGN SGDLQVTIKE ADGSTQIFTV 3 60 

PYSSVPLLQR EGHTRYSITA GEYRSGNAQQ EKPRFFQSTL LHGLPAGWTI YGGTQLADRY 420 

RAFNFGIGKN M GAL GAL S VD MTQANSTLPD DSQHDGQSVR FLYNKSLNES GTNIQLVGYR 480 

YSTSGYFNFA DTTYSRMNGY NIETQDGVIQ VKPKFTDYYN LAYNKRGKLQ LTVTQQLGRS 540 

25 STLYLS GSHQ TYWGTSNVDE QFQAGLNTAF EDINWTLSYS LTKNAWQKGR DQMLARNVNI 600 

PFSHWLRSDS KSQWRHASAS YS MS HDLNGR MTNLAGVYGT LLEDNNLSYS VQTGYAGGGD 660 

GNSGSTGYAT LNYRGGYGNA NIGYSHSDDI KQLYYGVSGG VLAHANGVTL GQP LNDTWL 72 0 

VKAPGAKDAK VENQTGVRTD WRGYAVLPYA TEYRENRVAL DTNTLADNVD LDNAVANWP 780 

TRGAIVRAEF KARVGIKLLM TLTHNNKPLP FGAMVTSESS QSSGIVADNG QVYLSGMPLA 840 

30 GKVQVKWGEE ENAHCVANYQ LPPESQQQLL TQLSAECR 878 
<212> Type : PRT 
<211> Length : 878 

SequenceName : SEQ ID 23 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 
40 MQIIFGEKCV SLLRLFFAAV LMLWCAQTAA YSGQCHTTQG NPYIGVNFGV KTLEEEENTT 60 
GWKDKFYQW NESNDYYVSC DCDKDNVRSG RWAFAADS PL VYLGDNWYKI NDYLAAKVLL 120 
QVKGSSPTAV PFENVGTGAD TRWHI CDPGG QRLGGQGASG NSGSFSLKIL QPFVGSWIP 180 
PMALARLFEC YNIPAGDSCT TTGTPVLVYY LSGTINSLGS CSVNAGETIE VDLGDVFAAN 240 
FRWGHKPLG ARTAELAI P V RCNTGNAGLV NVNLSLTATT DPSYPQAIKT SRPGVGVWT 3 00 

45 DSQNNXISPA GGTLPLSIPD DADSIA 326 
<212> Type : PRT 
<211> Length : 326 

SequenceName : SEQ ID 24 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

55 MKIKTLAIW LSALSLSSTA ALAAATTVNG GTVHFKGEW NAACAVDAGS VDQTVQLGQV 60 
RTASLAQDGA TSSAVGFNIQ LNDCDTNVAS KAAVAF LGTV I DAGHTNVLA LQSSAAGSAT 120 
NVGVQILDRT GAALTLD GAT FSEQTTLNNG TNTIPFQARY YAIGEATPGA ANADATF KVQ 180 
YQ 182 
<212> Type : PRT 

60 <211> Length : 182 

SequenceName : SEQ ID 25 
SequenceDescription : 



Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 
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MKLKVIATLI ATVAVGVSFN SNFASASTTS ASLTVNSNLT MGTCSAQIMD NSNKVINEW 60 
FGNVYISELG AKSKVQQFKI RFSNCSGLPQ NSAQIVLAPN GISCAGSQSS SAGFSNKFTD 120 
ASAATRTAVE VWTTDTPESN GSTQFHCAQK IPVPVTLPAD TTTQPYDYPL SARMTVAEGR 180 
LVTDVRPGNF RSPTTFTITY Q 201 
5 <212> Type : PRT 
<211> Length : 201 

SequenceName : SEQ ID 26 

SequenceDescription : 

10 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 0 0> PreSequenc eSt ring : 

EA3TVEYGET VDGWLEKDI QLVYGTANNT KINPGGEQHI KEFGISSNTE INGGYQYIEM 60 

15 NGTAEYSVLN DGYQIVQMGG AANQTTLNNG VLQVYGAAND PTIKGGRLIV EKDGITVLAA 12 0 

IEKGGLLEVK EGGLAIAVDQ KAGGAIKAST RVMEVFGTNR LGQFEIKNGI ANNMLLENGG 180 

SLRVEENDFA YNTTVDSGGL LEVMDGGTAT GVDKKAGGKL IVSTNALEVS GTNSKGQFSI 240 

KDGVSKNYEL DDGSGLIVME DTQAIDTILD EHATMQSLGK DTGTRVQANA VYDLGRSDQN 300 

GSITYSSKAI SENMVINNGR ANVWAGTMVN VS VRGNDGI L EVMKPQINYA PAMLVGKVW 3 60 

20 SEGASLRTHG AVDTS KADVS LENSAWTIIA DITTTNQNTR LNLANLAMSG ANVIMMDESV 420 

TRSSVTASAE NFTTLTTNTL SGNGNFYMRT DMANHQSDQL NVTGQATGDF KIFVTDTGAS 48 0 

PAAGDSLTLV TTGGGDAAFT LGNAGGWDI GTYEYTLLDN GNHSWSLAEN RAQITPSTTD 540 

VIjNMAAAQPL. VFDAELDTVR ERLGSVKGVS YDTAMWSSAI NTRNNVTTDA GAGFEQTLTG 60 0 

LTLGIDSRFS REESSTIRGL FFGYSHSDIG FDRGGKGNVD SYTLGAYAGW EHQNGAYVDG 660 

25 WKVDRFANT IHGKMSNGAT AFGDYNSNGA GAHVES GFRW VDGLWSVRPY LAFTGFTTDG 72 0 

QDYTLSNGMR ADVGNTRILR AEAGTAVSYH MDLQNGTTLE PWLKAAVRQE YADSNQVKVN 780 

DD GKFNND VA GTRGVYQ AG I RSSFTPTLSG HLSVSYGNGA GVESPWNTQA GWWTF 83 6 

<212> Type : PRT 
30 <211> Length : 836 

SequenceName : SEQ ID 27 
SequenceDescription : 

Sequence 

35 

<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MQRKGNKLLI QLCSVILLFF TTSWYALANE CYIERNAEGD YHMKISSTQL SLASQMVEVP 60 
TEIAEATWDV NIQLRGDAIG CKSLGDSKAV HFLNTADPSL ISTYTTTNGA ALLKTTVPGI 120 

40 VY-SVELLCLS CGAADELDLW LPAQSGADNF IPS TQTKWA Y EYSDQSWYLR FRLFITPEFK 180 
PKNGVSSGTT IAGKIASWYI GTNDQPWINF YIDNDSLKFF VDEPTCATVA LAQDQGNVSG 240 
NQVTLGNSYV SEVKNGLTRE IPFSIRAEYC YASKITVKLK AANKPSDATL VGKTTGSASG 3 00 

VAVKVNS T YD NSKVLLKADG SNTVDYNFAA WSNNLLFLPF TAQLVPDGSG NAVGVGTFSG 360 
NATFSFTYE 369 

45 <212> Type : PRT 

<211> Length : 369 

SequenceName : SEQ ID 28 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 

MYQFTHQKSR I PKKTLLAAC CALFYSSNGA AADTVEYDSS FLMGTGASTI DVKRYAQGNP 60 

55 TPPGLYNVRV FVNGQATSSL EIPFVDIGEN SAAACLTHKN LAQLHIKQPE QPVTLLAREG 120 

EEEDCLDLAK SYEKADVCFD GSDQFLDLTI PQAYVLKSYG GYVDPSLWES GINAATLAYT 180 

LNAYHTS SDN DNSDSVYGAF NSGINLGAWH FRARGNYNWT TDNGSDFDFQ DRYLQRDI PA 240 

IRSQIIMGDA YTTGETFDSV NVRGVRLYSD SRMLPSALAS YAP T I RGVAN SNAKVTVTQS 3 00 

GYKIYETTVP PGEFVIDDIS PSGFGSELW TIEEADGSKR TFTQPFSSW QMQRPGVGRW 360 

60 DFSAGKVIDD SLRSEPNMGQ ASYYYGLNNL FTGYTGIQFT DNNYLAGLLG VGI NTS I GAF 42 0 

AVDVTHS RAE IPDDKTYQGQ SYRVTWNKLF QDTGTSFNLA AYRYSTQDYL GLHDALVLID 480 

DAKHLSADED KNTMQTYSRM KNQFTVSINQ PLNIAYEDYG SLFISGSWTY YWAANNSRTE 540 

YNVGYSKSVS WGSFSVNLQR SWNEDGEKDD AMYVSVSVPI ENILGGKRKS SGFRNLNTQL 600 

NTDFDGSHQL NVNSSGNTEN NLVNYSVNAG YSLDKNAGDL ASVGGYLNYE SGLGGISASA 660 

65 SAT SDNS QQY SISTDGGFVL HSGGLTFTNN SFSSNDTLVL INALGAKGAR INNSNNEIDR 720 

WGYAVTSSVS PYRENRVGLN IETLENDVEL KSTSATTVPR SGSWLTRFE TDEGRSAVLN 78 0 

I TAANGKS I P FAAEVYQGEV MIGSMGQGGQ AFVRGINDSG ELIVRWYENN QTIDCKLHYQ 840 
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FPAQPQTQGS TNTLLL3STNLT CQVANH 866 
<212> Type : PRT 
<211> Length : 866 

SequenceName : SEQ ID 29 

SequenceDe script ion : 

Sequence 



<213> OrganismNarae : Escherichia coli 0157:H7 
10 <400> PreSequenceS tring : 

MKFKRLLHSG IASLSLVACG VNAATDLGPA GDIHFSITIT TKACEMEKSD LEVDMGTMTL 60 
QKPAAVGTVL SKKDFTr ELK ECDGISKATV EMDSQSDSDD DSMFALEAGG ATGVALKI ED " 120 

DKGTQQVPKG SSGTPIEWAI DGETTSLHYQ ASYVWNTQA TGGTANALVN FSITYE 176 

15 <212> Type : PRT 

<211> Length : 176 

SequenceName : SEQ ID 3 0 
SequenceDe sc ript ion : 

20 Sequence 



<213> Organi smName : Escherichia coli 0157 :H7 
<4 00> PreSequenceS tring : 

MKYNNIIFLG LCLGLTTYSA LSADSVIKIS GRVLDYGCTV SSDSLNFTVD LQKNSARQFP 60 
25 TTGSTSPAVP FQITLSECSK GTTGVRVAFN GIEDAENNTL LKLDEGSNTA SGLGIEILDG 120 
NMRPVKLNDL HAGMQWXPLV PEQNNILPYS ARLKSTQKSV NPGLVRASAT FTLEFQ 176 

<212> Type : PRT 
<211> Length : 176" 
30 SequenceName : SEQ ID 31 

SequenceDescrription : 

Sequence 

35 <213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceS tring : 

MKWRKRGYLL AAILALA^SAT IQAADVTITV NGKWAKPCT VSTTNATVDL GDLYSFSLMS 60 
AGAASAWHDV ALELTNCPVG TSRVTASFSG AADSTGYYKN QGTAQNIQLE LQDDS GNTLN 120 
TGATKTVQVD DSSQSAKFPL QVRALTVNGG ATQGTIQAVI SITYTYS 167 
40 <212> Type : PRT 

<211> Length : 167 

SequenceName : SEQ ID 32 
SequenceDe script ion : 

45 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceS tring : 

MKRAPL I TGL LL I S TS CAYA SSEGCGADST SGATNYSSW DDVTVNQTDN VTGRE FT SAT 60 
50 LSSTNWQYAC S C S AGKAJVKL VYMVSPVLTT TGHQTGYYKL NDSLDIKTMN RPGNPGD 117 

<212> Type : PRT 
<211> Length : 117 

SequenceName : SEQ ID 33 
55 SequenceDesc ript ion : 

Sequence 

<213> OrganismName : Escherichia coli 0157:H7 

60 <400> PreSequenceS tring : 

MKKALLAAAL VMASGSALAV DGGHIDFNGM VQSGTCKVGV VDTGMHSVTT DGWTLDTAN 60 
VTDTFAEVSA TAVGLLPKEF MISVECDPGA PKNAELTMGS ASYANTSGTL NNNMNITVNG 120 
IAPAQNVNIA VHNMKNKAGA AE I KQVHMNN SSEVQELTLD AEGKGQYVFN ASYVKAPNSP 180 
AVTAGHVTTN ALYTVAYK 198 

65 <212> Type : PRT 

<211> Length : 198 

SequenceName : SEQ ID 34 
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SequenceDescription : 
Sequence 

5 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MKPNMIVGAL ALTSVFMAGH LQAADGTVHF RGEIIDSTCE VTPETKDQW DLGKVNRTAF 60 
SGVDDVAAPT AFSIDLTQCP ETFKSAAIRF DGNEDAHGNG NLAIGTPLDN SNDAAAGISP 12 0 

SDNSGDYTGA GAVSAAKGVA IRLYNRADNT QVKLYENSAS TPISNGNASM KFMARYIATE 180 
10 TTIDPGTANA DSQFTVEYIK 2 00 

<212> Type : PRT 
<211> Length : 200 

SequenceName : SEQ ID 35 

SequenceDescription : . 

15 

Sequence 



<213> Organ israName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

20 MPIFQREGHIj KYSFAAGEYQ AGNYDSASPR FGQLDLIYGL PWGMTAYGGV LISNNYNAFT 60 
LGIGKNFGYI GAISIDVTQA KSELNNDRDS QGQSYRFLYS KSFESGTDFR LAGYRYSTSG 120 
FYTFQEATDV RSDADSDYNR YHKRSEIQGN LTQQLGAYGS VYLNLTQQDY WNDAGKQNTV 18 0 

SAGYNGRIGK VSYSIAYSWN KSPEWDESDR LWSFNISVPL GRAWSNYRVT TDQDGRTNQQ 240 
VGVSGTLLED RNLSYSVQEG YASNGVGNSG NANVGYQGGS GNVNVGYSYG KDYRQLNYSV 3 00 

25 RGGVIVHSEG VTLSQPLGET MTL I S VPGAR NARWNNGGV QVDWMGNAIV PYAMPYRENE 3 60 

ISLRSDSLGD DVDVENAFQK WPTRGAIVR ARFDTRVGYR VLMTLLRSAG SPVPFGATAT 420 
LITDKQNEVS SIVGEEGQIiY ISGMPEEGRV LIKWGNDASQ QCVAPYKLSL ELKQGGIIPV . 480 
SANCQ 485 
<212> Type : PRT 

30 <211> Length : 485 

SequenceName : SEQ ID 3 6 
SequenceDescription : 



Sequence 
35 

<213> OrganisraName : Escherichia coli 0157:H7 
<4 00> PreSeqpienceString : 

MSGYTVKPPT GDSNEQTQFI DYFNLFYSKR DQEQISISQQ LGNYGATFFS ASRQSYWNTS 60 
RSDQQISFGL NVPFGDITTS LNYSYSNNIW QNDRDHLLAF TLNVPFSHWM RTDSQSAFRN 12 0 

40 SNASYSMSND LKGGMTNLSG VYGTLL PDNN LNYSVQVGNT HGGNTSSGTS GYS TLNYRGA 180 
YGNTNVGYSR SGDSSQIYYG MSGGIIAHAD GITFGQPLGD TMVLVKAPGA DNVKIENQTG 240 
IHTDWRGYAI LPFATEYREN RVALNANS LA DNVELDETW TVIPTHGAIA RATFNAQIGG 3 00 

KVLMTLKYGM KSVPFGAIVT HGENKNGS IV AENGQVYLTG LPQSGKLQVS WGNDKNSNCI 3 60 

VDYKLPEVSP GTLLNQQTAI CR 382 

45 <212> Type : PRT 

<211> Length : 382 

SequenceName : SEQ ID 37 . 
SequenceDescription : 



50 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MSALYERSQL TQVMISSAPA TAETMEKAEY LRLDCTIKEV QFTAGQKQDI DVTTLCSTEQ 60 

55 ENINGLGASS EISMSGNFYL NQAQNALRDA YDNDTVYAFK VQFPSGKGFK FLAEVRQHTW 120 

SSGTNGWAA TFSLRLKGKP VSYWPLAFV KNLDKTLTVN TGALLTMSVS VNGGTPPYKH 18 0 

AWKKDGQPVE GQTTDTFSKP GAQSGDKGAY TCEVTDSAEQ PQSITSDACT VTVNGAGG 23 8 

<212> Type : PRT 
60 <211> Length : 238 

SequenceName : SEQ ID 3 8 
SequenceDescription : 



Sequence 

65 

<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 



WO 2005/076010 



PCT/IN2005/000037 



12/341 



MRNKPFYLLC AFLWLAVSHA LAADSTITIR 
NIGATTPWP FRILLS PCGN A.VS AVKVGF T 
QQNQIPLNAP SSAISWTTLT E>GKPNTLNFY 



GYVRDNGCSV AAESTNFTVD LMENAAKQFN 
GVADSHNANL LALENTVSAA SGLGIQLLNE 
ARLMATQVPV TAGHINATAT FTLEYQ 



60 
12 0 
176 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



<212> Type : PRT 

<211> Length : 176 

SequenceName : SEQ ID 39 
SequenceDescriptzLon : 

Sequence 



<213> OrganismName : Escherichia coli G157:H7 
<400> PreSequenceStrincj : 

MNKSWSISA AMLVLLCQPV MGSEISPATP SDEDNYTF0P QLFRGSRFSQ SSLAKLTTKE 60 

SVAPGNYKMD IYTNNKLSGS WNVTFKEAAD GRVLPCLTPE VADAIGLKTG EDKGEKDPVC 120 

TFAKELAPGI TSQTQLSQLR LDLSVPQSQL ISRPRGYVPP SELDTGASLA FMNYIANYYN 18 0 

VAYSGQNAHS QRSLWASFNG GINLGAWQYR QLSNMTWDND KGNQWNNIRS YLQRPLPAIN 240 

SQLMMGQLIT SGRFFSGLSY HGVS LATDER MLPDSMRGYA PTIRGVAATN ARVSVMQNGH 3 00 

EIYQTTVAPG PFEINDLYPT SYSGDLDVTV TEANGAVSRF SVPFSAVPES MRPGTSRYNV 3 60 

EVGKTQDSGD DSMFGDLTWQ HGMTNTLTFN SGSRIADGYQ ALMLGGVYGS SLGAFGANLT 42 0 

WSHARVPESE AQSGWMSQLT WSKTFQPTST TVS LAG YRYS TSGYRDLADV LGERHAASNK 480 

QSWDSSQWRQ QSRFDLTLSQ SLANYGNLFV SGSTQNYRGG KSRDTQLQLG YSNSFSHGIS 540 

MNLSVGRQRM GGYKDNSDDM QTVTSLSFSF PLGGNGPRVP SLSNSWTHST DGSSQLQSSL 600 

TGMLDEAQTT NYSLNVMRDQ QYKQTTLSGN MQKRFSQTTV GLNASKGQDY WQASGNVQGA 660 

MAVHGGGITF GPYLGETFAL VEAKGAEGAK VYNSSQLEIN DSGYALVPAV TPYRYNRISL 720 

DPQGMDGDAE LVDSERQVAP VAGAAVKVIF RTRPGKALLI KSRMADGSEL PMGADVLDEN 780 

NTWGIAGQG GQIYLRTEQT KGHLSVRWGE GANDSCQLPF DISGKDSNSP IIRLNETCQS 840 

<212> Type : PRT 
<211> Length : 840 

SequenceName :.. SEQ ID 40 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Escherichia coli G157:H7 
<400> PreSequenceString- : 

MKLAACFLTL LPGFAVAASW TSPGFPAFSE QGTGTFVSHA QLPKGTRPLT LNFDQQCWQP 60 

ADAIKLNQML SLQPCSNTPP QWRLFRDGKY TLQIDTRSGT PTLMISIQNA AEPVANLVRE 12 0 

CPKWDGLPLT LDVSATFPEG AAVRDYYSQQ IAIVKNGQIT LQPAATSNGL LLLERAETDA 180 

SAPFDWHNAT VYFVLTDRFE KTGDPSNDQSY GRHKDGMAE I GTFHGGDLRG LTMKLDYLQQ 24 0 

LGVNALW I S A PFEQIHGWVG GGTKGDFPHY AYHGYYTQDW TNLDANMGNE ADLRTLVDSA 3 00 

HQRGIRILFD WMNHTGYAT LADMQEYQFG ALYLSGDEVK KTLGERWSDW KPAAGQTWHS 3 60 

FNDYINFSDK TGWDKWWGKN WIRTDIGDYD NPGFDDLTMS LAFLPDIKTE STTASGLPVF 420 

YKNKTDTHAK AIDGFTPRDY LTHWLSQWVR DYGIDGFRVD TAKHVEL PAW QQLKTEASAA 480 

LREWKKANPD KALDDKPFWM XGEAWGHGVM QSDYYRHGFD AMINFDYQEQ AAKAVDCIAQ 540 

MDTTWQQMAE KLQGFNVLSY LSSHDTRLFR EGGDKAAELL LLAPGAVQIF YGDESSRPFG 60 0 

PTGSDPLQGT RSDMNWQDVS GKSAANVAHW QKISQFRARH PAIGAGKQTT LSLKQGYGFV 660 

REHGDDKVLV IWAGQQ 676 
<212> Type : PRT 
<211> Length : 676 

SequenceName : SEQ ID 41 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceStrincj : 

MPQRHHQGHK RTPKQLALII KRCLPMVLTG SGMLCTTANA EEYYFDPIML ETTKSGMQTT 60 

DLSRFSKKYA QLPGTYQVDI WLNKKKVSQK KITFTANAEQ LLQPQFTVEQ LRELGIKVDE 12 0 

I PALAEKDDD SVINSLEQII I?GTAAEFDFN HQRLNLSIPQ IALYRDARGY VSPSRWDDGI 180 

PTLFTNYSFT GSDNRYRQGN RSQRQYLNMQ NGANFGPWRL RNYSTWTRND QASSWNTISS 240 

YLQRD I KALK SQLLLGESAT SGSIFSSYNF TGVQLASDDN MLPNSQRGFA PTVRGIANSS 300 

AIVTIRQNGY VIYQSNVPAG AFEINDLYPS SNSGDLEVTI EESDGTQRRF IQPYSSLPMM 3 60 

QRPGHLKYSA TAGRYRADAN SDSKEPEFAE ATAIYGLNNT FTLYGGLLGS EDYYALGIGI 420 

GGTLGALGAL SMDINRADTQ FDNQHSFHGY QWRTQYIKDI PETNTNIAVS YYRYTNDGYF 480 

SFDEANTRNW DYNSRQKSEI QFNISQTIFD GVSLYASGSQ QDYWGNNEKN RNISVGVSGQ 540 
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QWGIGYSLNY QYSRYTDQNTtf DRALSLNLSI PLERWLPRSR VSYQMTSQKD RPTQHEMRLD 600 

GSLLDDGRLS YSLEQSLDDD NNHNSSVNAS YRSPYGTFSA GYSYGNDSSQ YNYGVTGGW 660 

IHPHGVTLSQ YLGNAFALXD ANGASGVRIQ NYPGIATDPF GYAWPYLTT YQENRLSVDT 720 

TQLPDNVDLE QTTQ FWP3STR GAMVAARFNA NIGYRVLVTV SDRNGKPLPF GALASNDDTG 78 0 

5 QQSIVDEGGI LYLSGISSKS QSWTVRWGNQ ADQQCQFAFS TPDSEPTTSV LQGTAQCH 83 8 

<212> Type : PRT 
<211> Length : 838 

SequenceMatne : SEQ ID 42 
10 SequenceDescription : 

Sequence 

<Z13> Organism&ame z Escherichia coli 0157;H7 

15 <400> Pre Sequence St ring : 

MMFRNRILLI FILWANFTWA GCRTTASLNI TDGINVGEIL ANETSFSKSV VFTGISCDTS 60 
TDKIVYKNIQ SDWVEVGPFG NGEKLKVKIE SLGKTSDTIG KSSNAQAVLP YWKIARGTP 12 0 

DFTGERKSTW FISDTVIANT GGESSSSIDF WLGICKALKF NWCVUYLTSK LAGDTFTLGL 180 
NISYYPKNTT CKPENTVIKV DDIALFQLRN QGKIAANSKE GTITLKCDNL FGDKKQASRN 24 0 

20 MWYLSSSDL VKGSNTILRG KTDNGVGFVL DLTEPPKGTE AAIKISANGD QGAATSLWKT 3 00 

DKPGVSLNSN IINIPVMASY YVYDEKKVKS GAL EAT AL I N VKYD 344 
<212> Type : PRT 
<211> Length : 344 

SequenceName z SEQ ID 43 

25 SequenceDescription : 

Sequence 



60 



<213> Organi smName : Escherichia coli 0157:H7 

30 <400> Pre Sequences taring : 

MIKKASLLTA CSVTAFSAWA QDTSPDTLW TANRFEQPRS TVLAPTTWT RQDIDRWQST 6 0 

SVNDVLRRLP GVDITQNGGS GQLSSIFIRG TNASHVLVLI DGVRLNLAGG SGSADLSQFP 12 0 

IALVQRVEYI RGPRSAVYGS DAIGGWNI I TTRDEPGTE I SAGWGSNSYQ NYDVSTQQQL 18 0 

GDKTRVTLLG DYAHTHGYDV VAYGNTGTQA QPDNDGFLSK TLYGALEHNF TDAWSGFVRG 240 

35 YGYDNRTNYD AYYSPGSPLV DTRKLYSQSW DAGLRYNGEL IKSQLITSYS HSKDYLTYDPH 3 00 

YGRYDSSATL DEMKQYTVQW ANNIIIGHGN VGAGVDWQKQ STAPGTAYVK DGYDQRNTGI 360 
YLTGLQQVGD FTFEGAARSD DNSQFGRHGT WQTSAGWEFI EGYRFIASYG TSYKAPNLGQ 42 0 

LYGFYGNPNL DPEKSKQWEG AFEGLTAGVN WRI S GYRND V SDLIDYDDHT LKYYNEGKAR 48 0 

IKGVEATANF DTGPLTHTVS YDYVDARNAI TDTPLLRRAK QQVKYQLDWQ LYDFDWGITY 540 

40 QYLGTRYDKD YSSYPYQTVK MGGVSLWDLA VAYPVTSHLT VRGKIANLFD KDYETVYGYQ 60 0 

TAGREYTLSG SYTF 614 
<212> Type : PRT 
<211> Length : 614 

SequenceName z SEQ ID 44 

45 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli Q157:H7 
50 <400> PreSequenceS taring : 

MKNKLLFMML TILGAPGIAA AAGYDLANSE YNFAVNELSK SSFNQAAIIG QAGTNNSAQL 60 
RQGGSKLLAV VAQEGSSNRA KIDQTGDYNL AYIDQAGSAN DAS ISQGAYG NTAMIIQKGS 120 
GNKANITQYG TQKTAIWQR QSQMAIRVTQ R 151 
<212> Type : PRT 
55 <211> Length : 151 

SequenceName = SEQ ID 45 
SequenceDescription : 



Sequence 



<213> OrganismName = Escherichia coli 0157:H7 
<400> PreSequenceS taring : 

MN I FAYLLVL VFSMSMSSSA FASWMTGTR IIFPGDAKEK TIQLRNTSDQ PYIINIHVED 60 

ERGSDKNVPF MPTPQTFRME AAAGQALRLL YTGNNLPQDR ESVFWFSFSQ LPYLNKNDKS 120 

65 QNQLILALTN RVKIFYRPSS IVGKSSDAPK NLTYQVKQNR IEVTNPTGYY VTIRAAELLN 180 

NGKKVPLANS VMIAPQSTTE WTLPSGISVA PGAQIHLVTV NDYGVNVTSE HAL 233 
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<212> Type : PRT 

<211> Length : 233 

SequenceMame : SEQ ID 46 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MKRLHKRFLL AT F CALL TAT LQAADVTITV NGRWAKPCT IQTKEANVNL GDLYTRNLQQ 60 
PGSASGWHNI TLSLTDCPAE TSAVTAIVTG S TDNTG YYKN EGTAENIQIE LRDDQDATLK 12 0 

NGDSKTVIVD EITRNAQFPL KARAI TVNGN ASQGTIEALI NVIYTWQ 167 
<212> Type : PRT 
<.211> 'Length 167 

SequenceName : SEQ ID 47 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 0 0> PreSequenceString : 

MRAKLLGIVL TTPIAISSFA STETLSFTPD NINADISLGT LSGKTKERVY LAE EGGRKVS 60 
QLDWKFNNAA IIKGAINWDL MPQISIGAAG WTTLGSRGGN MVDQDWMDSS NPGTWTDESR 12 0 

HPDTQLNYAN EFDLNIKGWL LNEPNYRLGL MAGYQESRYS FTARGGSYIY S S EEGFRDD I 180 
GSFPNGERAI GYKQRFKMPY IGLTGSYRYE DFELGGTFKY SGWVEASDND EHYDPGKRIT 240 
YRSKVKDQNY YSVSVNAGYY VTPNAKVYVE GTWNRVTNKK GNTSLYDHND NTSDYSKNGA 3 00 

GIENYNFITT AGLKYTF 317 
<212> Type : PRT 
<211> Length : 317 

SequenceName : SEQ ID 48 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MFFKRGKILS AGRLNKKSLG IVMLLSVGLL LAGCSGSKSS DTGTYSGSVY TVKRGDTLYR 60 
ISRTTGTSVK ELARLNGISP PYTIEVGQKL KLGGAKSSSS TRKSTAKSTT KTASVTPSSA 12 0 

VPKSSWPPVG QRCWLWPTTG KVIMPYSTAD GGNKGIDISA PRGTPIYAAG AGKWYVGNQ 180 
LRGYGNL I MI KHSEDYITAY AHNDTMLVNN GQSVKAGQKI ATMGSTDAAS VRLHFQIRYR 240 
ATAI DPLRYL PPQGSKPKC 2 59 

<212> Type : PRT 
<211> Length : 259 

SequenceName : SEQ ID 49 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MPTPNPLAPV KGAGTTLWVY NGNGDPYANP LSDNDWSRLA KVKDLTPGEL TAESYDDSYL 60 
DDEDADWAAT GQGQKSAGDT SFTLAWMPGE QGQQALLAWF NEGDTRAYKI RFPNGTVDVF 120 
RGWVSSIGKA VTAKEVITRT VKVTNVGRPS MAEDRSTVTA ATGMTVTPAS TSWKGQSTT 180 
LTVAFQPEGA TDKSFRAVSA DKTKATVSVS GMT I TVKGVA AGKVNIPWS GNGE FAAVAE 24 0 

INVTAS 246 
<212> Type : PRT 
<211> Length : 246 

SequenceName : SEQ ID 50 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MSALYERSQL TQVMISSAPA TAETMDKAEY LRLDCTIKEV QFTAGQKQDI DVTTLCSTEQ 60 
ENINGLGASS EISMSGNFYL NQAQNALRDA YDNDALYAFK VLFPSGKGFK FLAEVRQHTW 120 
SSGTNGWAA TFSLRLKGKP VSFWPLAFV KNLDKTLTVN TGALLTMSVS ANGGTPPYKY 180 
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AWKKDGQPVD GQTTDTFSKP GAQSADAGKY TCWTDSAEK AQSVTSVECT VTVSAAAG 23 8 

<212> Type : PRT 
<211> Length : 238 

SequenceName : SEQ ID 51 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli D157:H7 
<400> PreSequenceString : 

MKKSTLALW MGIVASASVQ AAEIYNKDGN KLDVYGKVKA MHYMSDNDSK DGDQSYIRFG 60 

FKGETQINDQ LTGYGRWEAE FAGNKAESDT AQQKTRLAFA GLKYKDLGSF DYGRNLGALY 12 0 

DVEAWTDMFP EFGGDSSAQT DNFMTKRASG LATYRNTDFF GVIDGLNLTL QYQGTOSENRD .. .18p.- 

VKKQNGDGFG TSLTYDFGGS DFAI SGAYTN SDRTNEQNLQ SRGTGKRAEA WATGLKYDAN 240 

NIYLATFYSE TRKMTPITGG FANKTQNFEA VAQYQFDFGL RPSLGYVLSK GKDIEGIGDE 3 00 

DLVNYIDVGA TYYFNKNMSA FVDYKINQLD SDNKLNINND DIVAVGMTYQ F 3 51 



<212> Type : PRT 

<211> Length : 351 

SequenceName : SEQ ID 52 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MRVKHAWLL MLISPLSWAG TMTFQFRNPN FGGNPNNGAF LLNSAQAQNS YKDPSYNDDF 60 
GIETPSALDN FTQAIQSQIL GGLLSNINTG KPGRMVTNDY IVDIANRDGQ LQLNVTDRKT 12 0 

GQTSTIQVSG LQNNSTDF 138 
<212> Type : PRT 
<211> Length : 138 

SequenceName : SEQ ID 53 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MKRKVLAMLV PALLVAGAAN AAEIYNKDGN KLDLYGKVAG LHYFSDDASS DGDMSYARIG 60 
FKGETQIADQ FTGYGQWEFN I GANGPESDK GNTATRLAFA GFGFGQNGTF DYGRNYGWY 120 
DVEAWTDMLP EFGGDTYAGA DNFMNGRANS VATYRNNGFF GQVD GLNFAL QYQGNNEKSG 18 0 

LFDQEGSGNG NGRKLAKENG DGSVCPLPMT LTLV 214 
<212> Type : PRT 
<211> Length : 214 

SequenceName : SEQ ID 54 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MNTVTLEGGT FNNNGTLNDV VKIEKNSNAV X3STNTGSLSTL QLHDGTVNNS GIASARVNAQ 60 

GDAVFNNLAG GEARKGAILY NSAWNNAGT WKMGYQDENN NAGTLD I DDK STFNNSGKLI 12 0 

LDNSKNAIRF QGSNANATLY NTGEMTLDAA Li GAGAI L YDD GASEFINKGV VDAKVTVAVS 180 

TAGATESDAF LWNQDGGVIN FDKDNASAVK FTHNNYVALN DGVMN I S GNN AVAMEGDKNA 240 

QLVNNGVINL GTEGTTDTGL TGMQLDANAT ADAVIENNGT INI FANDS FA FSVLGTEGHI 3 00 

VNNGTWIAD GVTGSGLIKQ GDSVNVEGVN GISTS GNN T E YH YTDYTLPDMP NTYTTSPFSE 3 60 

TTDSGSSDGS SNNLNGY I VG TNVDGSAGKL ICVNNASMNGV GINTGFAAGT ADTTVS FDNV 42 0 

VEGINLTDAD AITSTSVWT AKGSTDASGN VDVIMSKNAY TDVATDASVN DVAKALDAGY 480 

TNNELYTSLN VGTTAELNSA LKQVSGSQAT TVF REARVL S NRFSMLADAA PKVGNGLAFN 540 

WAKGDPRAE LGNNTEYDML ALRKTVDLSE SQSMSLEYGI ARLDGDGAQK AGDNGVTGGY 600 

SQFFGLKHQM S FDNGMRWNN ALRYDVHNLD SSRSVAYGDV SKTADTDVKQ Q YLELRS EGA 660 

KTFEPREGLK ITPYAGVKLR HSLEGGYQER HAGDFNLSMN SGSETAVDSI VGLKLDYAGK 720 

GGWSANATLE GGPNLSYSKS QRTAS L AGAG SQHFNVDDGQ KGGGINSLAS VGVKYSSKES 780 

SLNLDAYHWK EDGI SDKGVM LNFKKTF 807 
<212> Type : PRT 
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<211> Length : 807 

SequenceName : SEQ ID 55 
SequenceDescription z 

5 ' Sequence 

<213> OrganismName : Escherichia 
<400> Pre Sequences t ring : 
MLNGISNAAS TLGRQLVGIA SRVSSAGGTG 

10 IFNVSSQVTS FTPSRPAPPP PTSGQASGAS 
ARQAPPPPTS GQASGASRPL PPIAQALKEH 
SGASRPLPPI AQALKEHLAA YEKSKGPEAL 
AYEQSKKG 
<212> Type : PRT 

15 <211> Length : 248 

SequenceName : SEQ ID 56 
SequenceDescription = 



COli 0157:H7 

FSVAPQAVRL TPVKVHSPFS PGSSNVNART 60 
RPLPPIAQAL KEHLAAYEKS KGPEALGFKP 120 
LAAYEKSKGP EALGFKPARQ APPPPTSGQA 180 
GFKPARQAPP PPTGPSGLPP LAQALKDHLA 240 

248 



20 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MNKKIHSLAL LVNLGIYGVA QAQEPTDTPV SHDDTIWTA AEQNLQAPGV STITADEIRK 60 

NPVARDVSEI IRTMPGVNLT GNSTSGQRGN NRQIDIRGMG PENTLILIDG KPVSSRNSVR 12 0 

25 QGWRGERDTR GDTSWVPPEM IERIBVLRGP AAARYGNGAA GGWNIITKK GSGEWHGSWD 180 

AYFNAPEHKE EGATKRTNFS LTGPLGDEFS FRLYGNLDKT QADAWDINQG HQSARAGTYA 240 

TTLPAGREGV INKDINGWR WDFAPLQSLE LEAGYSRQGN LYAGDTQNTN SDAYTRSKYG 3 00 

DETNRLYRQN YSLTWNGGWD NGVTTSNWVQ YEHTRNSRIP EGLAGGTEGK FNEKATQDFV 3 60 

DNDLDDVMLH SEVNLPIDFL VNQTLTLGTE WNQQRMKDLS SNTQALTGTN TGGAIDGVSA 420 

30 TDRSPYSKAE IFSLFAENNM ELTDSTIVTP GLRFDHHS IV GNNWSPALNI SQGLGDDFTL 48 0 

KMGIARAYKA PSLYQTNPNY ILYSKGQGCY ASAGGCYLQG NDDLKAETSI NKEIGLEFKR 540 

DGWLAGITWF RNDYRNKIEA GYVAVGQNAV GTDLYQWDNV PKAWEGLEG SLNVPVSETV 600 

MWTNNITYML KSENKTTGDR LSIIPEYTLN STLSWQARED LSMQTTFTWY GKQQPKKYNY 660 

KGQPAVGPET KEISPYSIVG L S AT WD VTKN VSItTGGVDML FDKRLWRAGN AQTTGDLAGA 720 

35 NYIAGAGAYT YNEPGRTWYM SVNTHF 746 
<212> Type : PRT 
<211> Length r 746 

SequenceName : SEQ ID 57 
SequenceDescription : 



40 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 

<400> PreSequenceString : 
45 MGGRFSLRYK KLSYRFVFLT LAGCSSVGNQ SLKNETQESV KTKIVKGKTT KQDVLASFGE 60 

PDSRSLIDGE EQWSYTMYNS QSKATSFIPV VGLLAGGADS QTKSLTVSFK GEKVSTYIFN 12 0 

AGTSNVKTGI F 131 

<212> Type : PRT 

<211> Length : 131 
50 SequenceName : SEQ ID 58 

SequenceDescription : 



Sequence 



55 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MKKIACLSAL AAVLAFTAGT SVAATSTVTG GYAQSDAQGQ MNKMGGFNLK YRYEEDNS PL 60 
GVIGSFTYTE KSRTASSGDY NKNQYYGITA GPAYR INDWA S I YGWGVGY GKFQTTEYPT 12 0 

YKHDTSDYGF SYGAGLQFNP MENVALDFSY EQSRIRSVDV GTWIAGVGYR F 171 

60 

<212> Type : PRT 
<211> Length : 171 

SequenceName : SEQ ID 59 

SequenceDescription : 

65 

Sequence 
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<213> OrganistnName : Escherichia coli Ol57:H7 
<400> PreSequenceString : 

MKS IATLWC AISGIACVNL SAHAAEGEHT I SLGYAHFQF P GL KD F VKD A TAHNRETFSH 60 
FVNRNYFSSL GEYTDGRVSG YEGKDKNPQG INIRYRYEIT D3DFGVITSFT WTRSLTNSQT 120 
5 FIDVQSADHT RKI KNPAAS A RTDIRANYWS LLAGPSWRVN QYMSLYAMAG MGVAKVSADL 180 
KIKDNINSSG GFSESNSTKK TSLAWAAGAQ FNLNESVTLD V.AYEGSGSGD WRTSGVTAGI 240 
GLKF 244 
<212> Type : PRT 
<211> Length : 244 
10 SequenceName : SEQ ID 60 

SequenceDescription : 

Sequence 



25 



40 



15 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MRKLYAAILS AAICLTVSGA PAWASEQQAT LSAGYLHVST KTAPGSDNLNG INVKYRYEFT 60 
DTLGLVTSFS YAGDRNRQ I T RYSDTRWHED SVRNRWFSVM A.GPSVRVNEW F S A YAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 18 0 

20 SGSGDWRTDG FIVGVGYKP 199 
<212> Type : PRT 
<211> Length : 199 

SequenceName : SEQ ID 61 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

30 MRKLYAAILS AAICLAVSGA PAWASEQQAT LSAGYLHART SAPGSDNLNG INVKYRYEFT 60 
DTLGLVTSFS YAGDKNRQLT RYSDTRWHED SVRNRWFSVM A.GP S VRVNE W F S A YAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 180 
SGSGDWRTDG FIVGVGYKF 199 
<212> Type : PRT 

35 <211> Length : 199 

SequenceName : SEQ ID 62 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MRKLYAAILS AAICLAVSGA PAWASEQQAT LSAGYLHART SAPGSDNLNG INVKYRYEFT 60 

DTLGLVTSFS YAGDKNRQLT RYSDTRWHED SVRNRWFSVM AGPSVRVNEW F S A YAMAGVA 120 
45 YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 180 

SGSGDWRTDG FIVGVGYKF 199 

<212> Type : PRT 

<211> Length : 199 

SequenceName : SEQ ID 63 
50 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H*7 

55 <400> PreSequenceString : 

MRKLYAAILS AAICLAVSGA PAWASEQQAT LSAGYLHART SAPGSDNLNG INVKYRYEFT 60 
DTLGLVTSFS YAGDKNRQLT RYSDTRWHED SVRNRWFSVM AGPSVRVNEW FSAYAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 18 0 

SGSGDWRTDG FIVGVGYKF 199 

60 <212> Type : PRT 

<211> Length : 199 

SequenceName : SEQ ID 64 
SequenceDescription : 

65 Sequence 

<213> OrganismName : Escherichia coli 0157 :H*7 



WO 2005/076010 



18/341 



PCT/IN2005/000037 



<4 00> Pre Sequenc eSt ring : 

MVMSQKTLFT KSALAVAVAL ISTQAWSAGF QLNEFSSSGL GRAYSGEGAI ADDAGNVSRN 60 
PALITMFDRP TFSAGAVYID PDVNISGTSP SGRSL.KADNI APTAWVPNMH FVAPINDQFG 12 0 

WGASITSNYG LATEFNDTYA GGSVGGTTDL ETMNL.NLSGA YRLNNAWSFG LGFNAVYARA 180 
5 KIERFAGDLG QLVAGQIMQS PAGKTPQGQA L AATANG IDS NTKIAHLNGN QWGFGWNAGI 240 
LYELDKNNRY ALTYRSEVKI DFKGNYSSDL NRVFNNYGL P IPTATGGATQ SGYLTLNLPE 3 00 

MWEVSGYNRV DPQWAIHYSL AYTSWSQFQQ LKATSTSGDT LFQKHEGFKD AYRIALGTTY 360 
YYDDNWTFRT GIAFDDSPVP AQNRSISIPD QDRFWLSAGT TYAFNKDASV DVGVSYMHGQ 420 
SVKINEGPYQ FESEGKAWLF GTNFNYAF 448 
10 . <212> Type : PRT 

<211> Length : 448 

SequenceName : SEQ ID 65 
SequenceDescription : 

15 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

MAFSQAVSGL NAAATNLDVI GNNIANSATY GFKSGTASFA DMFAGSKVGL GVKVAGITQD 60 

20 FTDGTTTNTG RGLDVAISQN GFFRLVDSNG SVFYSRNGQF KLDENRNLVN MQGLQLTGYP 12 0 

ATGTPPTIQQ GANPTNISIP NTLMAAKTTT TASMQINLNS SDPLPSVNAF DASNADSYNK 180 

KGSVTVFDSQ GNAHDMSVYF VKTGDNNWQV YTQDSSDPTG TAEPAMKLVF NANGVLTSNP 240 

TENITTGAIN GAEPATFSLS FLNSMQQNTG ANNIVATTQN GYKPGDLVSY QINDDGTWG 3 00 

NYSNEQTQLL GQIVLANFAN NEGLAS EGDN VWSATQSSGV ALL GTAGT GN FGTLTNGALE 3 60 

25 ASNVDLSKEL VNMIVAQRNY QSNAQTIKTQ DQILNTLVNL R 4 01 



<212> Type : PRT 

<211> Length : 401 

SequenceName : SEQ ID 66 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

35 MSKSTFLHIL ISSIILVALI QSSAWANCTN TQIGQTEDGR TALIEFGKIN MTDTYFAPAG 60 
SLLATTWPP TNYTSGGATG SSVLWECDAT DLPISTIYFLVA TNGDDRVGGF YDAGGPDGLS 12 0 

DVYATWFAFV GLKQTMAGVT LGRYWKKVP I TSYA-TQGTKI QIRLQDIPPL HAELYRI STL 180 
PDTSATTSWC GNNNTDSSGV GFAKPSGTIY NCVQPNAYIQ LSGTSGILFG HDEPGEDSSV 240 
HWDFWGADNG FGYGMRSANR LYNNATCVAR SATPLVLLPT IAEAQLNAGM ESTGNFNVRV 3 00 

40 ECSNSVQSGI SDTQTALGIQ VSEGAYTAAQ KLGXINSNGG VSALVSDNYD AAEMAKGVGI 3 60 

YISNSAHPDT AMTLVGQPGI AKLTP GGNAA GWYPVFEGAT LEGATHPGYS SYSYSFIARL 42 0 

KKL PNQTVS A GKVRATA Y I L VKMQ 444 
<212> Type : PRT 
<211> Length : 444 

45 SequenceName : SEQ ID 67 

SequenceDescription : 

Sequence 



50 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MENNRNFPAR QFHSLTFFAG LCIGITPVAQ ALAA.EGQTNA DDTLWEAST PSLYAPQQSA 60 

DPKFSRPVAD TTRTMTVISE QVIKDQGATN LTDALKNVPG VGAF FAGENG NSTTGDAIYM 120 

RGADTSNS I Y IDGIRDIGSV SRDTFNTEQV EVIKGPSGTD YGRSAPTGS I NM I S KQPRND 180 

55. SGIDASASIG SAWFRRGTLD VNQVIGDTTA VRLNVMGEKT HDAGRDKVKN ERYGVAPSIA 240 

FGLGTANRLY LNYLHVTQHN TPDGGIPTIG LPGYSAPSAG TATLNHSGKV DTHNFYGTDS 3 00 

DYDDSTTDTA TMRFEHDIND NTTIRNTTRW SRVKQDYLMT AIMGGASNIT QPTSDVNSWT 360 

WSRTANTKDV SNKILTNQTN LTSTFYTASI GHDVSTGVEF TRETQTNYGV NPVTLPAVNI 420 

YHPDSSIHPG GLTRNGANAN GQTDTFAIYA FDTL.QITRDF ELNGGI RLDN YHT E YDS ATA 480 

60 CGGSGRGAIT CPAGVAKGSP VTTVDTAKSG NLVNWKAGAL YHL TENGNVY INYAVSQQPP 540 

GGNNFALAQS GSGNSANRTD FKPQKANTSE IGTKWQVLDK RLLLTAALFR TDIENEVEQN 600 

DDGTYSQYGK KRVEGYEISV AGNITPAWQV I GG Y/TQQKAT I KNGKD VAQD GSSSLPYTPE 660 

HAFTLWSQYQ ATDDISVGAG ARYIGSMHKG SDGAVGTPAF TEGYWVADAK LGYRVNRNLD 720 

FQLNVYNLFD TDYVASINKS GYRYHPGEPR TFLLTANMHF 760 

65 <212> Type : PRT 

<211> Length : 760 

SequenceName : SEQ ID 68 
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SequenceDescription : 
Sequence 



5 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MQMKKLLPIL IGLSLSGFSS LSQAENLMQV YQQARIiSNPE LRKSAADRDA AFEKINEARS 60 

PLLPQLGLGA DYTYSNGYRD ANGINSNATS ASLQLTQSIF DMSKWRALTL QEKAAGIQDV - 120 

TYQTDQQTLI LNTATAYFNV LNAIDVLSYT QAQKEAIYRQ LDQTTQRFNV GLVAITDVQN 180 

10 ARAQYDTVLA NEVTARNNLD NAVEQLRQIT GNYYPELAAL NVEMFKTDKP QPVNALLKEA 240 

EKRNLSLLQA RLSQDIiAREQ I RQAQDGHL P TLDLTASSGI SDTSYSGSKT RGAAGTQYDD 3 00 

SNMGQNKVGL SFSLPIYQGG MVNSQVKQAQ YNFVGASEQL ESAHRSWQT VRSSFNNINA 3 60 

SISSINAYKQ AWSAQSSLD AMEAGYSVGT RTIVDVLDAT TTLYNAKQEL ANARYNYLIN 420 

QLNIKSALGT LKEQDLLALN NALSKPtfSTN PENVAPQTPE QN&IADGYAP DSE&PVYQQT 480 

15 SARTTTSNGH NPFRN 495 



<212> Type : PRT 

<211> Length : 495 

SequenceNarae : SEQ ID 69 
SequenceDescription : 

20 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

25 MTKLKLLALG VL I ATSAGVA HAEGKFSLGA GVGWEHPYK DYDTDVYPVP VINYEGDNFW 60 
FRGLGGGYYL WNDATDKLSI TAYWSPLYFK AKDSGDHQMR HLDDRKS TMM AGLSYAHFTQ 120 
YGYLRTTLAG DTLDNSNGIV WDMAWLYRYT NGGLTVTPGI GVQWNSENQN EYYYGVSRKE 180 
SARSGLRGYN SNDSWSPYLE LSASYNFLGD WSVYGTARYT RLSDEVTDSP IVDKSWTGLI 240 
STGITYKF 248 

30 <212> Type : PRT 

<211> Length : 248 

SequenceNarae : SEQ ID 70 
SequenceDescription : 

35 Sequence 



<213> OrganismNarae : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MKKTLLAAGA VLALSSSFTV NAAENDKPQY LSDWWHQSVN WGSYHTRFG PQIRNDTYLE 60 

40 YEAFAKKDWF DFYGYADAPV FFGGNSDAKG IWNHGSPLFM EIEPRFSIDK LTNTDLSFGP 120 

FKEWYFANNY IYDMGRNKDG RQSTWYMGLG TDIDTGLPMS LSMNVYAKYQ WQNYGAANEN 180 

EWDGYRFKIK YFVP I TDLWG GQLSYIGFTN FDWGSDLGDD SGNAINGIKT RTNNSIASSH 240 

ILALNYDHWH YSWARYWHD GGQWNDDAEL NFGNGNFNVR STGWGGYLW GYNF 2 94 

45 <212> Type : PRT 

<211> Length : 294 

SequenceNarae : SEQ ID 71 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MLSTQFNRDN QYQAITKPSL LAGCIALALL PSAAFAAPAT EETVIVEGSA TAPDDGENDY 60 

55 SVTSTSAGTK MQMTQRDIPQ SVTIVSQQRM EDQQLQTLGE VMENTLGISK SQADSDRALY 12 0 

YSRGFQIDNY MVDGIPTYFE SRWNLGDALS DMALFERVEV VRGATGLMTG TGNPSAAINM 180 

VRKHAT S RE F KGDVSAEYGS WNKERYVADL QSPLTEDGKI RARIVGGYQN NDSWLDRYNS 240 

EKTFFSGIVD ADLGDLTTLS AGYEYQRIDV NSPTWGGLPR WNTDGSSNSY DRARS TAPDW 3 00 

AYNDKE INKV FMTLKQRFAD TWQATLNATH SEVEFDSKMM YVDAYVNKAD GMLVGPYSNY 360 

60 GPGFDYVGGT GWNSGKRKVD ALDLFADGSY ELFGRQHNLM FGGSYSKQNN RYFSSWANIF 420 

PDEIGSFYNF NGNFPQTDWS PQSLAQDDTT HMKSLYAATR VTLADPLHLI LGARYTNWRV 4 80 

DTLTYSMEKN HTTPYAGLVF DINDNWSTYA SYTSIFQPQN DRDSSGKYLA PITGNNYELG 540 

LKSDWMNSRL TTTLAIFRIE QDNVAQSTGT PIPGSNGETA YKAVDGTVSK GVEFELNGAI 600 

TDNWQLTFGA TRYIAEDNEG NAVNPNLPRT TVKMFTSYRL PVMPELTVGG GVNWQNRVYT 660 

65 DTVTPYGTFR AEQGS YALVD LFTRYQVTKN FSLQGNVNNL FDKTYDTNVE GSIVYGAPRN 72 0 

FSITGTYQF 729 
<212> Type : PRT 
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<211> Length : 729 

SequenceNarae : SEQ ID 72 
SequenceDescription : 

5 Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MARFQFKNRK NNGLIFFISF MVMGEAAIAA PLPQWANAPA VTPVAQLSLQ ESILRAFARN 60 
10 PGVTQQAAQI GIGEAQIDEA KSAWYPHVGL TGNAGPSRQT DS S GRLDNNV SYGITLTQLV 120 
YDFGKTNNDI NLQTAARDSY RFKLMATLTD VAEKTATAYM EVSRYQALCD AAQRNIHSLE 180 
NVYNMAALRA NAGLNSSSDE LQAQTRIAGM RSTLEQYQAQ MA S AKAQLAV LTGVQPEAIA 240 
APPAELAEQP VSLKNIDYQS IPLVLAAENL RQSAQYGVEK TKAQYWPTLS IQGGKTRYQT 3 00 

SDRSYWDDQL QLNVNAPLYQ GGAVSAQVQQ AEGQQKISAS QVEQAKLDVL QRASVAYANW 3 60 

15 TGARGREEAG LAQSESAHKT RDVYQNEYKL GKRSLNDLLT VEQDVFQAQS AEINANYDGW 420 
VAAVNYAAAV NNLIPLAGIK QGLYNDLPDL K 451 
<212> Type : PRT 
<211> Length : 451 

SequenceName : SEQ ID 73 
20 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli Q157:H7 

25 <400> PreSequenceString : 

MAKFTPSFSG IKGRALFSLL FAAPM IHATD TATTKDGETI TVTADANTAT EATDGYQPLS 60 

TSTATLTDMP MLDIPQWNT VSDQVLENQN ATTLDEALYN VSNWQTNTL GGTQDAFVRR 120 

GFGANRDGSI MTNGLRTVLP RSFNAATERV EVLKGPASTL YGILDPGGLX NWTKRPEKT 18 0 

FHGSVSATSS SFGGGTGQLD ITGPIEGTQL AYRLTGEVQD EDYWRNFGKE RSTFIAPSLT 240 

30 WFGDNATVTM LYSHRDYKTP FDRGTI FDLT TKQPVNVDRK IRFDEPFNIT DGQSDLAQLN 3 00 

AEYHLNSQWT ARFDYSYSQD KYSDNQARVT AYDATTGTLT RRVDATQGSX QRMHS TRADL 3 60 

QGNVDIAGFY NEILGGVSYE YYDLLRTDMI RCKNAKDFNI YNPVYGNTSK CTTVSASDSD 420 

QTIKQESYSA YAQDALYLTD NWIAVAGIRY QYYTQYAGKG RPFNVNTDSR DEQWTPKLGL 480 

VYKLTPSVSL FANYSQTFMP QSSIASYIGD LPPESSNAYE VGAKFELFDG ITADIALFDI 540 

35 HKRNVLYTES IGDETIAKTA GRVRSRGVEV DLAGALTENI NIIASYGYTD AKVLEDPDYA 600 

GKPLPNVPRH TGSLFLT YD I HNMPGNNTLT FGGGGHCVSR RSATNGADYY LPGYFVADAF 660 

AAYKMKLQYP VTLQLNVKNL FDKTYYTSSI ATNNLGNQIG DPREVQFTVK MEF 713 



<212> Type : PRT 
40 <211> Length : 713 

SequenceName : SEQ ID 74 
SequenceDescription : 

Sequence 
45 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MRTLQGWLLP VFMLPMAVYA QEATVKEVHD APAVRGSIIA NMLQEHDNPF TLYPYDTNYL 60 
IYTQTSDLNK EAIASYDWAE NARKDEVKFQ LSLAFPLWRG ILGPNSVLGA SYTQKSWWQL 120 

50 SNSEESSPFR ETNYEPQLFL GFATDYRFAG WTLRDVEMGY NHDSNGRSDP TSRSWNRLYT 180 
RLMAENGNWL VEVKPWYWG NTDDNPDITK YMGYYQLKIG YHL GDAVL SA KGQYNWNTGY 240 
GGAELGLSYP ITKHVRLYTQ VYSGYGESLI DYNFNQTRVG VGVMLNDLF 289 
<212> Type : PRT 
<211> Length : 289 

55 SequenceName : SEQ ID 75 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MAVQKNVIKG I LAGTFALML SGCVTVPDAI KGSSPTPQQD LVRVMSAPQL YVGQEARFGG 60 
KWAVQNQQG KTRLEIATVP LDSGARPTLG EPSRGRIYAD VNGFLDPVDF RGQLVTWGP 120 
ITGAVDGKIG NTPYKFMVMQ ATGYKRWHLT QQVIMPPQPI DPWFYGGRGW PYGHGGWGWY 18 0 

65 NPGPARVQTV VTE 193 
<212> Type : PRT 
<211> Length : 193 
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SequenceName : SEQ ID 76 
SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 

MRKQWLGICI AAGMLAACT S DDGQQQTVSV PQPAVCNGPI VEISGADPRF EPLNATANQD 60 
YQRDGKSYKI VQDPSRFIQA GLAAIYDAEP GSNLTAS GEA FDPTQLTAAH PTLPIPSYAR 120 
10 ITNLANGRMI WRINDRGPY GNDRVISLSR AAADRLNTSN NTKVRIDPII VAQDGSLSGP 180 
GMACTTVAKQ TYALPAPPDL SGGAGTSSVS GPQGDILPVS NSTLKSEDPT GAPVTSSGFL 240 
GAPTTLAPGV LEGSEPTPAP QPVVTAPSTT PATSPAMVTP QAASQSASGN FMVQVGAVSD 3 00 

QARAQQYQQQ LGQKFGVPGR VTQNGAVWR I QLGPFANKAE ASTLQQRLQT EAQLQSFITT 3 60 

AQ - •■ 362 

15 <212> Type : PRT 

<211> Length : 3 62 

SequenceName : SEQ ID 77 

SequenceDescription : 

20 Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MIKRVLWSM VGLSLVGCVN NDTLSGDVYT AS EAKQVQNV SYGTIVNVRP VQI QGGDDSN 60 
25 VI GA I GGAVL GGFLGNTVGG GTGRSLATAA GAVAGGVAGQ GVQSAMNKTQ GVELE I RKDD 120 

GNTIMWQKQ GNTRFSPGQR WLASNGSQV TVSPR 155 

<212> Type : PRT 

<211> Length : 155 

SequenceName : SEQ ID 78 
30 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 

35 <400> PreSequenceString : 

MSKATEQNDK LKRAIIISAV LHVILFAALI WSSFDENIEA SAGGGGGSSI DAVMVDSGAV 60 

VEQYKRMQSQ ESSAKRSDEQ RKMKEQQAAE ELREKQAAEQ ERLKQLEKER LAAQEQKKQA 120 

EEAAKQAELK QKQAEEAAAK AAADAKAKAE ADDKAAEEAA KKAAADAKKK AEAEAAKAAA 180 

EAQKKAEAAA AALKKKAEAA EAAAAEARKK AAAEKAAADK KAAEKAAAEK AAADKKAAAE 240 

40 KAAADKKAAA AKAAAEKAAA AKAAAEADDI FGELSSGKNA PKTGGGAKGN NASPAGSGNT 3 00 

KNNGAS GAD I NNYAGQ I KS A IESKFYDASS YAGKTCTLRI KLAPD GMLLD IKPEGGDPAL 3 60 

CQAALAAAKL AKIPKPPSQA VYEVFKNAPL DFKP 3 94 
<212> Type : PRT 
<211> Length : 394 

45 SequenceName : SEQ ID 79 

SequenceDescription : 

Sequence 



50 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MMKFKKCLLP VAMLASFTLA GCQSNADDHA ADVYQTDQLN TKQE TKTVN I ISILPAKVAV 60 
DNSQNKRNAQ AFGAL I GAVA GGVI GHNVGS GSNSGTTAGA VGGGAVGAAA GSMVNDKTLV 120 
EGVSLTYKEG TKVYTSTQVG KECQFTTGLA WITTTYNET RIQPNTKCPE KS 172 

55 

<212> Type : PRT 
<211> Length : 172 

SequenceName : SEQ ID 80 

SequenceDescription : 

60 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 
65 MLLSIITVAF RNLEGIVKTH ASLAHLAQAE DISFEWIWD GGSNDGTREY LENLNGIYNL 6 0 

RFVSEPDNGI YDAMNKGIAM AQGKFAL FLN SGDIFHQDAA YFVRKLKMQK DNVMI TGDAL 120 
LDFGDGHKIK RSAKPGWYIY HSLPASHQAI FFPVSGLKKW RYDLEYKVSS DYALAAKMYK 180 
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AGYAFKKLNG LVSEFSMGGV STTNNMELCA DAKKVQRQIL HVPGFWAELS WHLRQRTTSK 240 
TKALYNKS 248 
<212> Type : PRT 
<211> Length : 248 
5 SequenceName : SEQ ID 81 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Haemophilus influenzae Rd 
<400> PreSequenceString : 

MKLTTLQTLK KGFTLIELMI VIAIIAILAT IAIPSYQNYT KKAAVSELLQ ASAPYKADVE 60 
LCVYSTNETT SCTGGKNGIA AD I KTAKGYV ASVITQSGGI TVKGNGTLAN ME Y I LQAKGtT 120 
AAAGVTW TTT CKGTDASLFP ANFCGSVTK . - . 149 

15 <212> Type : PRT 

<211> Length : 149 

SequenceName : SEQ ID 82 

SequenceDescription : 

20 Sequence 



<213> OrganismName : Haemophilus influenzae .Rd 
<40 0> PreSequenceString : 

MLNKKFKLNF IALTVAYALT PYTEAALVRD DVDYQI FRDF AENKGRF SVG ATNVEVRDKU 60 

25 NHSLGNVLPN GIPMIDFSW DVDKRIATLI NPQYWGVKH VSNGVSELHF GNLNGNMNNG- 120 

NAKSHRDVSS EENRYFSVEK NEYPTKLNGK AVTTEDQTQK RREDYYMPRL DKFVTEVAPX 180 

EASTASSDAG TYNDQNKYPA FVRLGSGSQF I YKKGDNYS L I LNNHE VGGN NLKLVGDAYT* 240 

YGIAGTPYKV NHENNGLIGF GNSKEEHSDP KGILSQDPLT NYAVLGDSGS PLFVYDREKG- 3 00 

KWLFLGSYDF WAGYNKKSWQ EWNIYKPEFA KTVLDKDTAG SLTGSNTQYN WNPTGKTSVX 3 60 

30 SNGSESLNVD LFDSSQDTDS KKNNHGKSVT LRGSGTLTLN NNIDQGAGGL FFEGDYEVKG- 420 

TSDSTTWKGA GVSVADGKTV TWKVHNPKSD RLAKI GKGTL I VEGKGENKG SLKVGDGTVX 48 0 

LKQQADANNK VKAFSQVGIV SGRSTWLND DKQVDPNSIY FGF RGGRLDA NGNNLTFEHX 540 

RNIDDGARLV NHNTSKTSTV- TITGESLITD PNTITPYNID APDEDNPYAF RRI KDGGQL Y" 600 

LNLENYTYYA LRKGASTRSE LPKNSGESNE NWLYMGKTSD EAKRNVMNHI NNERMNGFNG 660 

35 YFGEE EGKNN GNLNVTFKGK SEQNRFLLTG GTNLNGDLKV EKGTLFLSGR PTPHARDIAG 720 

ISSTKKDQHF AENNEWVED DWINRNFKAT NINVTNNATL YSGRNVANIT SNITASDNASC 780 

VHIGYKAGDT VCVRSDYTGY VTCTTDKLSD KALNS FNATN VSGNVNLSGN ANFVLGKANH. 840 

FGTISGTGNS QVRLTENSHW HLTGDSNVNQ LNLDKGHIHL NAQNDANKVT TYNTLTVN"SI_» 900 

SGNGSFYYLT DLSNKQGDKV WTKSATGNF TLQVADKTGE PTKNELTLFD ASNATRNNL1ST 960 

40 VSLVGNTVDL GAWKYKLRNV NGRYDLYNPE VEKRNQTVDT TNITTPNMIQ ADVPSVPSNTST 1020 

EE I ARVETPV PPPAPATPSE TTETVAENSK QESKTVEKNE QDATETTAQN GEVAEEAKPS 1080 

VKANTQTNEV AQSGSETEET QTTE I KETAK VEKEEKAKVE KDEIQEAPQM ASETSPKQAK 1140 

PAPKEVSTDT KVEETQVQAQ PQTQSTTVAA AEATSPNSKP AEETQPSEKT NAEPVTPWS 12 00 

KNQTENTTDQ PTEREKTAKV ETEKTQEPPQ VASQASPKQE QSETVQPQAV LESENVPTWT 1260 

45 NAEEVQAQLQ TQTSATVSTK QPAPENSINT GSATAITETA EKSDKPQTET AASTEDASQHE 1320 

KANTVADNSV ANNSESSDPK SRRRRSISQP QETSAEETTA ASTDETTIAD NSKRSKPNRR 13 80 

SRRSVRSEPT VTNGSDRSTV ALRDLTSTNT NAVISDAMAK AQFVALNVGK AVSQHISQLE 1440 

MNNEGQYNW VSNTSMNENY SSSQYRRFSS KSTQTQLGWD QTISNNVQLG GVFTYVRNSIST 1500 

NFDKASSKNT LAQVNFYSKY YADNHWYLGI DLGYGKFQSN LKTNHNAKFA RHT AQ F GL» TJ\ 1560 

50 GKAFNLGNFG ITPIVGVRYS YLSNANFALA KDRIKVNPIS VKTAFAQVDL SYTYHLGEFS 1620 

VTPILSARYD TNQGSGKINV NQ YD FAYNVE NQQQYNAGLK LKYHNVKLSL IGGLTKAKQA 168 0 

EKQKTAELKL SFSF 1694 
<212> Type : PRT 
<211> Length : 1694 

55 SequenceName : SEQ ID 83 

SequenceDescription : 



Sequence 



60 <213> OrganismName : Haemophilus influenzae Rd 
<400> PreSequenceString : 

MALVNKI KTL SSVGILAATL FLAGCQAQSN ILAFTPPAPS AS MNVNRTAV VSVTTKDSRA 60 
IQEIASYTKH GELIKLNASP SVTQLFQQVM QQNLISKGFR VGQLNGSNAW VTVDVREFGT 120 
QVEQGNLRYK LNTKIQATVY VQGAKGSYNK SFNVTHSQEG VFNAGNDEIH KVLSQTFNDI 180 
65 VNNIYQDQEV AAAINQYSN 19 9 

<212> Type : PRT 
<211> Length : 199 
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SequenceName : SEQ ID 84 
SequenceDe script ion : 

Sequence 

<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MLCWIGYKNG ILPQQNSTLY PWLNPSKCGV IFDGFQLVGD DFNSDQTAEN TSPAWQVLYT 60 
THLQSCSPIH SGENFAPIPL YKQLKNQPHL SQDLIKWQEN WQACDQLQMN GAVLEQQSLA 120 
EISDHQSTLS KHGRYLiAQE I EKETGIPTYY YLYRVGGQSL ESEKSRCCPS CGANWALKDA 18 0 

IFDTFHFKCD TCRLVSNLSW NFL 2 03 

<212> Type : PRT 
<211> Length : 203 

SequenceName : SEQ ID 85 

SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MGAFAFASVT NAN I YAEGD I GLSQTKANGS NNTRVGPRVS VGYKVGNTRV AGDYTHHGKV 60 
DGTKIQGLGA SVLYDFDTNS KVQPYVGARV ATNQFKYTNR AEQKFKSSSD IKLGYGWAG 12 0 

AKYKLDGNWY ANGGVEYNRL GNFDSTKVNN YGAKVGVGYG F 161 
<212> Type : PRT 
<211> Length : 161 

SequenceName : SEQ ID 86 

SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<400> PreSequenceString z 

MKKLLIASLL FGTTTTVFAA PFVAKDIRVD GVQGDLEQQI RAS LPVRAGQ RVTDNDVANI 60 

VRSLFVSGRF DDVKAHQEGD VLWSWAKS IISDVKIKGN SIIPTEALKQ NLDANGFKVG 120 

DVLIREKLNE FAKSVKEHYA SVGRYNATVE PIVNTLPNNR AEILIQINED DKAKLASLTF 180 

KGNESVSSST LQEQMELQPD SWWiCLWGNKF EGAQFEKDLQ SIRDYYLNNG YAKAQITKTD 240 

VQLNDEKTKV NVTIDVNEGL QYDLRSARII GNLGGMSAEL EPLLSALHLN DTFRRSDIAD 3 00 

VENAIKAKLG ERGYGSATVN SVPDFDDANK TLAITLWDA GRRLTVRQLR FEG3KTTV SAD S 3 60 

TLRQEMRQQE GTWYNSQLVE LGKIRLDRTG FFETVENRID PINGSNDEVD WYKVKERNT 420 

GSINFGIGYG TESGISYQAS VKQDNFLGTG AAVS I AGTKN DYGTSVNLGY TEPYFTKDGV 48 0 

SLGGNVFFEN YDNSKSDTSS NYKRTTYGSN VTLGF PVNEN NSYYVGLGHT YNKISNFALE 540 

YNRNLYIQSM KFKGNGIKTN DFDFS FGWNY NSLNRGYFPT KGVKASLGGR VTIPGSDNKY 60 0 

YKLSADVQGF YPLDRDHLWV VSAKASAGYA NGFGNKRLPF YQTYTAGGIG SLRGFAYGSI 660 

GPNAI YAEHG NGNGTFKKIS SDVIGGNAIT TASAELIVPT PFVSDKSQNT VRTSLFVDAA 72 0 

SVWNTKWKSD KSGLDNNVLK SLPDYGKSSr" IRASTGVGFQ WQSPIGPLVF SYAKPIKKYE 780 

NDDVEQFQFS IGGSF 795 
<212> Type : PRT 
<211> Length : 795 

SequenceName : SEQ ID 87 

SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<40 0> PreSequenceString : 

MLKKTSLIFT ALLMTGCVQN ANVTTPQAQK MQVEKVDKAL QKGEADRYLC QDDRWRWH 6 0 

ATHKKYKKNL HYVTVTFQGV SEKLTLMISE RGKNYANIRW MWQERDDFST LKT3STLGEILA 120 
TQCVSQTSER LSGQ 134 
<212> Type : PRT 
<211> Length : 134 

SequenceName : SEQ ID 88 

SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<40 0> PreSequenceString : 
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MRIIIIFFMG LNMTNFRLER ACLFRYAWAN GRCCLCSSTN QPTNQPTNQP TNQPTNQPTN 6 0 

QPTNQNSNVS EQLEQINVSG STENSDTKTP PKIAETVKTA KTLEREQANN IKDIVKYETG 120 

VTWEAGRFG QSGFAIRGVD ENRVAINIDG LRQAETLSSQ GFKELFEGYG NFNNTRNGAE 18 0 

IETLKEVNIT KGADSIKNGS GSLGGSVIYK TKDARDYLIN KDYYVSYKKG YATENNQSFD 240 

TLTLAGRYKK FDVLWTTSR NGHELENYGY KNYNDKIQGK KREKADPYKI EQDSTLLKLS 3 00 

FNPTENHRFT FAADLYEHRS RGQDLSYTLK YQRSGNETPE VDSRHTNDKT KRRNISFSYE 3 60 

NFSQTPFWDT LKLTYSDQRI KTRARTDEYC DAGVRHCEGT DNPTGLKVTN GKITRRDGSD 42 0 

LQFEEKNNTA KSSDKTYDFK KFIDTDKRVI DDKLVLNNP S DTWYDCSIFN CENNAKIKVF 480 

KGNNYYGYDG KWKEVDLEIK ELNGKKFAKI KDNDRKI KS I LPSSPGYLER LWQERDLDTN 540 

TQQLNLDLTK DFKIWHIEHN LQYGGSYNTA MKRMVNRAGN DASDVQWWAT PTLGEDSWTG SOO 

KPHTCATTYE WNANLCPRVD PEFSYLLPIK TTGKSVYLFD NFVITDYLSF DLGYRYDNIH 6 GO 

YQPKYKHGIT PKLPDDIVKG LFIPLPNNSN SDPNKVKENV QQNIDYIAKQ NKKYKAHSYS 720 

FVSTIDPTSF LRLQLKYSKG FRTPTSDEMY FTFKHPDFTI LPNTDLKPEI AKTKEIAFTL 7 80 

HNDDWGFIST SLFKTNYKNF IDLIFKKQET FSYGGSGRGE TLPFSLYQNI NRDNASI*ICGI ^4Q 

EINSKVFLGK MAKFMDGFNL SYKYTYQKGR MNGNIPMNAI QPRTMVYGLG YDHPNHKFGF 300 

DFYTTHVASK NPEDTYNMFY KEENKKDSTI KWRSKSYTIL DLIGYVQPIK NLTIRAGVYN 960 

LTNRKYITWD SARSIRSFGT SNVIDQSTGL GINRFYAPGR NYKMSVQFEF 1O10 
<212> Type : PRT 
<211> Length : 1010 

SequenceName : SEQ ID 89 

SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MTYRNGKIDL KERFS KNRS F KGIKKKIAKK YTIKNSLSII YSLKTHSNSS LSINKKIFLG 60 
LGFVSALSAQ SEDYNSSVYW LNSVNENNNN KSYYISPLRT WAGGNRS FTQ NYNNSQLYIG 12 0 

TKNASATPNH SSVWFGEKGY IGFITGVFKA RDIFITGAVG SGNELKTGGG AILVFES SNE 18 0 

LTTNGAYFQN NRAGTQTSWI NLISNNSVNL TNTDFGNQTP NGGFNVMGRK I T YNGGSVNG 240 
GNFGFDNVDS NGATTISGVT FNNNGALTYK GGNGIGGSIT FTNSNINHYK LNLNANSVTF 3 00 

NNSTLGSMPN GNANTIGNAY ILNANNITFN NLTFNGGWFV FNRSDAHVNF QGTTTINNPT 3 60 

SPFVNMTGKV TINPNAI FNI QNYTPTIGNA YTLFSMKNGN IAYDDVNNLW Nil RLKNTQA 42 0 

TKDNSKNATS NNNTHTYYVT YNLGGTLYHF RQIFSPDSIV LQSVYYGANN LYYTNSVNIH 480 
DNVFNLKNIN DDRADTIFYL NGLNTWNYTQ ARFAQTYGGK NSALVFNATT PWANGAIPKS 540 
NSTVRFGGYE GVNWGKTGYI TGTFTADRVY I TGNMMS GNG AQTGGGATLN FVGATE INI A GQ0 
GATFKNLKTT SQNSYMTFMA LGNGSGSGKI NVSQSDFYDW TDGGYDFTGN GVFDSVNFNK 660 
AYYKFQGAEN SYNFKNTNFL AGNFKFQGKT TIEKSVLNDA SYAFDGVNNA FNEDKFNGGS V20 
FNFNHAEQTN AFNNNSFSGG SFSFNAKQVD FNGNS FNGGV FNFNNTPKAS FTNDTFNVNN 780 
QFKINGAQTD FTFS KGWFN MQGLLSSLSV GTTYQLLNAK SVGYKDNNNA LYQMLRWTSG 840 
ENPSGKLVDE NKTAPNSAKI YNVQFTDNGL TYYIKENFNN GITLTRLCTL GYTHCVNIDN 900 
DAFNLKNVNN NASNTVFYLN GMTTWKTAGT GVFTQDYSGT NSVLVFNQTT PFLAGANPTS 960 

NSWGFGKTS GAEWGLVGYI QGVFKANQID ITGTIRSGNG AKTGGGATLV FNAQERLNIA 102 0 

NANLNNDKAG LQNSWMNFIV NNGNLNVTNA NFSNQTPHGG FNLKANNITW DKGSVSGGGN 108 0 

FGVDNANANG NAVIKNVNFS DNGTLIYKGG ENSAGNSLTL ENNTFNSYNI NAKAQNLIFN 1140 

NNSFNSGSYS FNDTKNVTFK GTNTLINSDP FSRLKGSVSI DNNSIFNIER DLTDKTTYTL 120 0 

LSGDNIKYNN QALADNVFSK NLWDLIHYDG EQGTLLRTDN NTYFVQFTQS NGQKFVFEET 12 60 

FNPGSITYKY FTIHSSPFHT EADSKDIWNQ VRKQFDFIPG KTPVCVGVCY IAPYKNQDLI 132 0 

GSSAFAWSLN FGATWGTLL LGSAQEKANN NGGS I WFGKN NLL YLHGNFN ATNIFLTNNF 1380 

NVGNPNAGGG ATINFNADET LSADGLNYTN FQTVAMGLQT SASQHSWANF NSKLSMEIKN 1440 

SNFRDFTWGG FRFNSGRITF ENTTFSGWTN INGATESGSS YVNMVANTDL IFTDSILGGG 1500 

IRYDLKANNI I FNNTQMWD VSKNVNQSSL NGNVTFNHSR LSVKPNAAIN IGGDQTQTTL 1560 

ENASSLSFYN DSVAMFNGTT AFNGVS YLNL NPNAQVSFNQ ANFNNANVTF YGI PLFGKTP 162 0 

NFGNSVRLIN FKGDAKFNQA TLNLRAKNIH LNFQGASTFE NNSTMNLAES SQASFNALSV 168 0 

EGETNFNLNG SSLLS FNGNS VFNAPVNFYA NNSQISFTHS AT FNADAS FD LGNNSTLNFQ 1740 

SVLLNSALNL LGNGGNNLAI NAKGNFSFGS QGILNLSYMN LFGGDKKASV YDVLQAQNID 180 0 

GLRGNNGYEK IRFYGIQIEK ADYSFNNGVH SWSFTNPLNT TETITETLHN NRLKVQISQM 1860 

GASNNAMFNL APSLYDYQQN PYDESENSYN HTSDKAGTYY LSSSIKGFGK NNEIPGTYNA 192 0 

QNQPLQALHI YNQAISKQDL NMIASLGKEF LPKVAKLIAS GALDNLNLNS PDSFETIFSI 1980 

LKEYGITLNQ ANWKSLLKII NNFSNTANYH FSQGSIiWGA IKEGQTNTNS WWFGGDGYK 2 04 0 

NPCAVGDNTC QMFRQTNLGQ IiLNSSVPYLG YINANFKAKN IYITGTIGSG NAWGSGGSAN 2100 

VSFESATNLV LNQANIDAQG TDKIFSYLGK EGIDKLFGEK GLGNVLSNIV YEESLNDNAI 2160 

PKDLANMIPK DLGSKTLSSL LSPTEVNNLL GVSAFKNAIM EILNSKTVGD VFGENGLLNA 2 22 0 

LDPVKRKEID QMLLEQIQAH SSGFEKFIVK TLGIENVENF INNWYGKQSL SSFANNFVPG 2 280 

GLNQALDKIG SSSDAKDLQS FLDKTTFGDI LNQMINQAPL INKLISWLGP QDLSVLVNIA 2 340 

LNS I TNPSKE LLGAISGMGQ KVLNDLLGEG WNKIMSNQV LGQMINKI I A DKGFGGVYHQ 2400 

GLGSILPKSL QDELKKLGMG SLLKPKGLHN LWQKGNFNFV AKNHVFVNNS LFSNATGGEL 2 460 
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NFVAGKSIIF NGKNTINFTQ YQGRLSFVSK DFSNISLDTL NATNGLTLNA SKNDISVQKG 252 0 

QICVNVLDCM TAKGKTTQTN SSSSATAPTN ETLEVSANNF AFLGTIKANG LVDFSKVLQN 258 0 

TTIGTLDLGP NATFKANNLI VNNAFNNNSN YRANISGNFN VAKGATFSTN ENGLNVGGNF 264,0 

NSEGPLIFNL NNPTHQTIIN VTGTSTIMSY NNQALINFNT QLKQGAYTLI NANRMVYGYD 27 0 0 

5 NQTILGGSLS DYLKLYTLID FNGKRMQLNG DSLSYDNQPV SIKDGGLWS FKDNQGQMVY 2760 

SSILYDKIQV TVSDKPMSIQ APSLEYYVKR IQGSAGLNAI KSAGNNSIMW LSELFAAKGG 28 2 0 

NPLFAPYYLQ DNPTEHIVTL MKDITSALGM LSNSNLKNNS TDVLQLNTYT QQMSRLAKLS 2 88 0 

NFAS FDSTDF SERLSSLKNQ RFADAVPNAM DVILKYSQRD KLKNNLWATG VGGVS FVENG 294 0 

TGTLYGVNVG YDRFVRGVIV GGYAAYGYSG FYERITSSKS DNVDVGLYAR AFIKKSELTF 30 0 0 

10 SVNETWGANK TQISSNDALL SMINQSYKYS TWTTNAKVNY GYDFMFKNKS IILKPQIGLR 3 06 0 

YYYIGMSGLE GVMNNVLYNQ FKANADPSKK SVLTIDFALE NRHYFNTNSY FYAIGGVGRD 312 0 

LLVNSMGDKL VRFIGNNTLS YRKGDLYNTF ANITTGGEVR LFKSFYANAG VGARFGLDYK 318 0 

MIDIIGNIGM RLAF 3194 
<212> Type : PRT 

15 <211> Length : 3194 

SequenceName : SEQ ID 90 
SequenceDescription : 

Sequence 

20 

<213> Organi smName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKQF KKKPKK IKRSHQNQKT ILKRPLWLMP LLIGGFASGV YADGTDILGL SWGEKSQKVC 60 

VHRPWYAIWS CDKWEEKTQQ FTGNQL1TKT WAGGNAANYY HSQNNQDITA NLKNDNGTYF 12 0 

25 LSGLYNYTGG EYNGGNLDIE LGSNATFNLG ASSGNSFTSW YPNGHTDVTF SAGTINVNNS 180 

VEVGNRVGSG AGTHTGTATL NLNANKVTIN SNISAYKTSQ VNVGNANSVI TINSVSLNGD 2 40 

TCSSLARVGV GANCSTSGPS YSFKGTTNAT NTTFSNSSGS FTFEENATFS GAKLNGGAFT 3 O 0 

FNKKFNATNN TAFNSGS FTF KGTSSFNGAN FSNASYTFNN QATFQNSSFN GGT FTFNDQT 3 60 

NQSTQHPQIQ NSSFSGSATT LKGFATFEQA FNNSNHQLTI QNASFNNATF NNTGKITIEK 4 20 

30 DASFNNTSFN TPVDTNNMTI SGGVTLSGKN DLKNGATLDF GSSKITLTQG TTFNLTSLGS 4 80 

EKSVTILNSR GGITYNHLLN HAINSLTNAL KTNESSSKPQ SFAQGLWDMI TYNGVTGQLL 5 40 

NENAATS KPT DSSPSKSSTN STQVYQVGYK IGDTIYKLQE TFSHNSIIIQ ALESGTYTPP 6O0 

PVINGSKFDL SASNYINADM PW YNHKYYI P KSQNFTESGT YYLPSVQIWG SYTNSFKQTF 6 60 

SASNSNLVIG YNATWTDHNV SSSDTVAFGD TSGSALNGHC GPWPYYQCTG TTNGTYSAYH 72 0 

35 VYITANLRSG NRIGTGGAAN LIFNGVDSIN IANATITQHN AGAYSSSMTF STQNMDNSQN 7 80 

LNGLNSNGKL LVYGTTFTNQ AKDGKFIFNA GQATFENTNF NGGSYQFSGD SLNFSNNNQF 8 40 

NSGSFEIGAK NT I FNNANFN NSTSFNFNNS SATTSFVGDF TNANSNLQ I A GNAVFGNSTN 9O0 

GSQNTANFNN TGSVNIAGNA TFDNWFNSP TNTSVKGKVT LNNITLKNLN APLSFGDGTI 9 60 

VFSAHSVINI GEAITNGNPI TLVSSSKAIE YNDAFSKNLW QLINYQGHGA SSEKLVSSAG 10 2 0 

40 NGVYDWYSF NNQTYNFQEV FSPNSISIRR LGVGMVFDYV DMEKSDRLYY QNALGFMTYM 10 8 0 

PNSYNNNLGN LNNTIYYYDN SIDFYASGKT LFTKAEFSQT FTGQNSAIVF GAKNIWTSVS 1140 

DAPQSNVIIR FGDNKGAGSN DASGHCWNLQ CIGFITGHYE AQKIYITGSI ESGNRISSGG 12 00 

GASLNFNGLQ GILLTNATLY NRAAGTQSSS MNFVSNSANI QAQNSYFIDD TAQNKGNPNF 12 60 

SFNALNLDFS NSSFRGYVGQ TQSVFKFNAV NAISFTNSSN LSSGLYQMQA KSVLFDNSNL 13 2 0 

45 SVSVGTSSIK ANAINLSQNA SINASNHSTL ELQGDLNLND TSSLNLNQSA INVSNNATIN 13 80 

DYASLIASNG SHLNFNGAVN FNSANITTSL SSSSIVFKGA VSLRGQFNLS NNSSLDFQGS 14 4 0 

SAITSNTAFN FYDNAFSQSP ITFHQALDIK VPLSLGGNLL NPNNSSVLNL KNSQLVFSDQ 15 00 

GSLNIANIDL LSDLNGNKNR VYNIIQADMN GNWYERINFF GMRINDGIYD AKNQTYSFTN 15 60 

PLNNALKITE SFKNNQLSVT LSQIPGIKNT LYNIGSEIFN YQKVYNNANG VYSYSDDAQG 16 20 

50 VFYLTSSVKG YYNPNQSYQA SGSNNTTKNN NLTSESSVIS QTYNAQGNPI SALHVYNKGY 16 80 

NFSNIKALGQ MALKLYPEIK KILGNDFSLS SLSNLKGDAL NQLTKLITPS DWKNINELID 1740 

NANNSWQNF NNGTL 1 1 GAT KIGQTDTNSA WFGGLGYQK PCDYTDIVCQ KFRGTYLGQL IB 00 

LESISADLGY IDTTFNAKEI YLTGTLGSGN AWGTGGSASV TFNSQTSLIL NQANIVSSQT 18 60 

DGIFSMLGQE GINKVFNQAG LAN I LGEVAM QSINKAGGLG NLIVNTLGSD SVIGGYLTPE 19 20 

55 QKNQTLSQLL GQNNFDNLMN DSGLNTAIKD LIRQKLGFWT GLVGGLAGLG GIDLQNPEKL 19 8 0 

IGSMSINDLL SKKGLFNQIT GFISANDIGQ VISVMLQDIV KPSDALKNDV AALGKQMIGE 2O40 

FLGQDTLNSL ESLLQNQQIK SVLDKVLAAK GLGSIYEQGL GDLIPNLGKK GIFAPYGLSQ 2IL00 

VWQKGDFS FN AQGNVFVQNS TFSNANGGTL SFNAGNSLIF AGNNHIAFTN HSGTLNLLSN 2X60 

QVSNINVTML NASNGLKINA TNNNVSVSQG NLFINASCVQ QSDPTTASAT NPCTTAQNNA 22 2 0 

60 SSSNASNNAP IALNNNDESL WTANGFNFS GNIYANGWD FSKIKGSANV KNLYLYNNAQ 22 80 

FQANNLTISN QAVLEKNASF VTNNLNIQGA FNNNATQKIE VLQNLVIASN ASLSTGIYGL 23 4 0 

EVGGALNNLG AIHFNLENSQ TPVNPLIQVG GIINLNTTQT P FMNVS VANG GTYTLLKSSR 24 00 

YIDYNINPNS LQSYLKLYTL ININGNHIEE KNGVLTYLGQ RVLLQDKGLL LSVALPNSNN 24 60 

ASQNNILSLS VLHNQIKMSY GNKVMDFTPP TLQDYIVGIQ GQSALNQIEA VGGNNA I KWL 25 2 0 

65 STLMMETKEN PLFAPIYLEN HSLNEILGVT KDLQNTASLI SNPNFRNNAT SLLEMASYTQ 25 8 0 

QTSRLTKLSD FRAREGESNF SERLLELKNK RFSDPNPSEV FVKYSQLSKH PNNLWIQGVG 2640 

GASFISGGNG TLYGLNVGYD RLVKSVILGG YVAYGYSGFN GNIMHSLANN VDVGMYARAF 2700 
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LKRNEFTLSA NET YGGNAS H INSSNSLLSV LNQRYNYNTW TTSVNGNYGY DFMFKQKSW 2760 
LKPQVGLSYH FIGLSGMKGK MQNPAYQQFV MHSNPSNESV LTLNMGLESR KYFGKNSYYF 2620 
VTARLGRDLL I KAKGDNWR FVGENTLLYR KGEIFNTFAS VI TGGEMHLW RLMYVNAGVG 2880 
LKMGLQYQDL NITGNVGMRV AF 2902 
<212> Type : PRT 
<2X1> Length : 2902 

SequenceNarae : SEQ ID 91 

SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<4 00> PreSequenceString : 

MAFKKARLIS RFISKGSFKL NKISK2CFFTL NQILKRFKPI* ICRHKKTKSIE KPFNKNKSFL * 60 

KASVLLIGAL GGLSHLRANE CRYWSWSSWS YQDNIESGPN SPTHNSYCLF SSAQGSGTYY 120 

LNTLTTYSAG GAS FTQKFNG GTLDIGGNIR FGGTGINGGD VGYITGTYNA QTMNFNSSHI 180 

TTGNSYADGG GTTLNFNATN NITINQASFD NSDAGTQKSY MNFKGSNIKI SGSSFTDDTN 240 

GGFNFSGNNN NSTISFNQTS FNQGTYNFSN SATLSFNNSN FNQGTYHFNS AQSTFENSNF 300 

NQGTYNFNDN TSFNNDTFNQ GTYNFNSSKV SFSGANTLNS SSPFASLKGS VSFNSGAIFN 3 60 

LNQTLNNNQT YDILTTNGAI QYGVYQSYLW DLINYKGDKA ISHVEVSNNT YDVTFDINGQ 420 

DETLQETFSN QSIITQFLGD DLQQQAQQTY QEDVANSQNA LNKVASDNTI ANNDTSYTQS 480 

SNPTILKDAQ GLENTNQQIQ QDEKALEKDL AQIKQIANST TGFNEQAFTQ AQKQEQQDEQ 540 

ALQNDENAFN TEQEGLEQAI ANAKHANPTP NPTPSPTPTP IKHTAPNTPP SQVPPTPPSQ 600 

NLPKTNVWNG VYWLQNKTYS NKGIYYIDPN LSGQSGQSGN TLSTYTANLL GRS FGVNANN 660 

GTLIIGNNTE SVNDNGLIWI GHGGFGYITG TFSAANIYLT NNFKTGEGVS NSDGGGANIT 720 

FKASDNI TMD GLNYNNAETV TKMIQTGASQ HSYTTFDATN NISVTDSDFS DMTWGKFSFS 780 

AKNISFSNAS FSGFTNPGGS STISTNASNS LSFTDSRLNG GAIYNLQANS LIFNNTQAVF 840 

NVLYSRGTSN FNATTQLLGN TSFTLSSQSL LNFNGDTTLQ NNANITLGNK SQAAFKNSLT 900 

LDNNSNLSLD NQSVLNANGT SAFNNQASLN IYNGSQAAFS SLFFNGGTLS LNANSKLNAS 960 

SASFSNNTTI NLDDSVLNAN NTSSLMANIN FQGASQADFG GNTTIDTASF NFDSASSLNF 1020 

NNLTANGALN FNGYAPSLTK ALMNVSGQFV LGNNGDINLS DINIFDNITK SVTYNILNAQ 1080 

KGITGISGAN GYEKILFYGM KIQNATYSDN NNIQTWSFIN PLNSSQIIQE SIKNGDLTIE 1140 

VLNNPNSASN TIFNIAPELY NYQDSKQNPT GYSYDYSDNQ AGTYYLTSNI KGLFTPKGSQ 1200 

TPQTPGTYSP FNQPLNSLNI YNKGFS SENL KTLLGILSQN SATLKEMIES NQLDNITNIN 12 60 

EVLQLLDKIK ITQAQKQALL ETINHLTDNI NQTFNNGNLV IGATQDNVTN STSSIWFGGN 1320 

GYSSPCALDS ATCSSFRNTY LGQLLGSTSP YLGYINADFK AKSIYITGTI GSSNAFESGG 13 80 

SADVTFQSAN NLVLNKANIE AQATDNI FNL LGQEGIDKIF NQGNLANVLS QMAMEKIKQA 1440 

GGLGNF I ENA LSPLSKELPA SLQDETLGQL IGQNNLDDLL NNSGVMNEIQ NIISQKLSIF 1500 

GNFVTPSIIE NYLAKQSLKS MLDDKGLLNF IGGYIDASEL SSILGVILKD ITNPPTSLQK 1560 

DIGWANDLL NEFLGQDWK KLESQGLVSN IINNVISQGG LSGVYNQGLG SVLPPSLQNA 1620 

LKENDLGTLL SPRGLHDFWQ KGYFNFLSNG YVFVNNSSFS NATGGS LNFV ANKSIIFNGD 1680 

NTIDFSKYQG ALIFASNGVS NINITTLNAT NGLSLNAGLN NVSVQKGEIC INLANCPTTK 1740 

NSSPANSSVT PTNESLSVHA NNFTFLGTII SNGAIDLSQV TNNSVIGTLN LNENATLQAN 1800 

NLTITNAFNN ASNSTANIDG NFTLNQQATL STNASGLNVM GNFNSYGDLV FNLSHSVSHA 18 60 

IINTQGTATI MANNNPLIQF NASSKEVGTY TLIDSAKAIY YGYNNQITGG SSLDNYLKLY 1920 

ALIDINGKHM VMTDNGLTYN GQAVSVKDGG LWGFKDSQN QYIYTSILYN KVKIAVSNDP 1980 

INNPQAPTLK QYIAQIQGVQ SVDSIDQAGG NQAINWLNKI FETKGS PL FA PYYLESHSTK 2040 

DLTTIAGDIA NTLEVIANPN FKNDATNILQ INTYTQQMSR LAKLSDTSTF ARSDFLERLE 2100 

ALKNKRFADA IPNAMDVILK YSQRNRVKNN VWATGVGGAS FISGGTGTLY GINVGYDRF I 2160 

KGVIVGGYAA YGYSGFHANI TQSGSSNVNV GVYSRAFIKR SELTMSLNET WGYNKTF INS 2220 

YDPLLSIINQ SYRYDTWTTD AKINYGYDFM FKDKSVIFKP QVGLSYYYIG LSGLRGIMDD 2280 

PIYNQFRANA DPNKKSVLTI NFALESRHYF NKNSYYFVIA DVGRDLFINS MGDKMVRFIG 2340 

NNTLSYRDGG RYNTFASIIT GGEIRLFKTF YVNAGIGARF GLDYKDINIT GNIGMRYAF 2399 

<212> Type : PRT 
<211> Length : 2399 

SequenceNarae : SEQ ID 92 

SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MEIQQTHRKI NRPLVSLVLA GALISAIPQE SHAAFFTTVI IPAIVGGIAT GTAVGTVSGL 60 

LSWGLKQAEE ANKTPDKPDK WRIQAGKGF NEFPNKEYDL YKSLLSSKID GGWDWGNAAR 12 0 

HYWVKGGQWN KLEVDMKDAV GTYKLSGLRN FTGGDLDVNM QKATLRLGQF NGNSFTSYKD 18 0 

SADRTTRVNF NAKNISIDNF VEINNRVGSG AGRKASSTVL TLQASEGITS SKNAEISLYD 240 
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GATLNLASNS VKLNGNVWMG RLQYVGAYLA 
GIIASNKTHI GTLDL WQ SAG LNIIAPPEGG 
VINPPNNTQK TETEPTQVID GPFAGGKDTV 
NIGKGGVNLS NQAS GRTLLV ENLTGNITVD 
5 NGTATFNNDI SLGRFVNLKV DAHTANFKGI 
VAVKNFNINE LIVKTNGISV GEYTHFSEDI 
LVINDFYYSP WNYFDARNVK NVEITRKFAS 
SNLTIQGDFI NNQGTINYLV RGGKVATLNV 
L I KNTEHVLL KAKIIGYGNV STGTNGISNV 

10 KACGMAI GNTQ SMVNNPDNYK YLIGKAWRNI 
LPTNTTNNAH SANYALVKNA P FAHS AT PNIi 
SGAQGRDLLQ TLLIDSHDAG YARQMIDNTS 
LSLSNAMILN SRLVNLSRKH TNHINSFAQR 
NVWANAIGGA SLNSGSNASL YGTSAGVPAF 

15 ANNANFGVYS RFFANQHEFD FEAQGALGSD 
YGYDFAFFRN ALVLKPSVGV SYNHLGSTNF 
YYGDTSYFYL HAGVLQEFAH FGSNDVASLN 
LNLGWYLHN LISNASHFAS NLGMRYSF 
<212> Type : PRT 

20 <211> Length : 1288 

SequenceName : SEQ ID 93 
SequenceDe script ion : 



PSYSTINTSK VQGEVDFNHli TVGD QNAAQ A 3 00 

YKDKPNSTTS QSGTKNDKKE ISQNNNSNTE 360 

VNIFHLNTKA DGTIKVGGFK AS L TTNAAHL 42 0 

GPLRVNNQVG GYALAGS SAN FEFKAGVDTK 480 

DTGNGGFNTL DFSGVTDKVN INKLITASTN 540 

GSQSRINTVR LETGTRSIFS GGVKFKSGEK 600 

STPENPWGTS KLMFNNLTLG QNAVMDYSQF 660 

GNAAAMMFNN DIDSATGFYK PLIKiNSAQD 720 

NLEEQFKERL ALYNNNNRMD TCWRNTDD I 780 

G I S KTANGS K ISVYYLGNST PTENGGNTTN 840 

VAINQHDFGT IESVFELANR SKDIDTLYTH 90 0 

TGE I TKQLNA ATDALNNVAS LEHKQSGLQT 960 

LQALKGQEFA SLESAAEVLY QFAPKYEKPT 102 0 

LNGNVEAIVG GFGSYGYSSF SNQANSLNSG 10 80 

QSSLNFKSTL LQDLNQSYNY LiAYSATARAS 1140 

KSNSQSQVAL KNGASSQHLF NANANVEARY 1200 

TFKINAARSP LSTYARAMMG GELQLAKEVF 1260 

1288 



25 



30 



35 



40 



45 



Sequence 

<213> OrganistnName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKKHILSLTL GSLLVSTLSA EDDGFYTSVG YQIGEAAQMV TNTKGIQDLS 
NRYSTLNTLI KLSADPSAIN AVRENLGASA 
GYVTQCGGNA NGQKSISSKT I FNNEPGYRS 
QILQTALKRG LPALKENNGK VNVTYTYTCS 
KSVTTTISSK WDSRADGNT TGVSYTEITN 
HASNSSEANA PKFSTTTGKI CGAFSEEISA 
ASFAQGMLAN ASAQAKMLNL 



NGKPFNPFTD 
PSTAGTGGTQ 
VNFKSRYSEL 
QTINQELGRN 
IKSSFFNSAS 
VNLATMNNVY 
YSFMGAELKY 
<212> Type 
<211> 



DRYESLNNLL 60 

KNLIGDKANS PAYQAVLLAI NAAVGFWNW 120 

TSITCSLNGH SPGYYGPMSI ENFKKLNEAY 180 

GDGNNNCSSQ VTGWNQKDG TKTKIQTIDG 240 

KLEGVPDSAQ ALLAQAS TL I NTINNACPYF 3 00 

IQKMITDAQE LVNQTSVINE HEQTTPVGNN 3 60 

AEQVGQAINP ERLSGTFQNF VKGFLATCNN 420 

GSAPGTVTTQ T FAS GCAYVG QTITNLKNSI AHFGTQEQQI QQAENIADTL 480 

GNTYNSITTA LSNIPNAQSL QNAVSKKNNP YSPQGIDTNY YLNQNS YNQ I 540 

PFRKVGIVSS QTNNGAMNGI GIQVGYKQFF GQKRKWGARY YGFFDYNHAF 600 

DVWTYGFGAD ALYNFINDKA TNFLGKNNKL SVGLFGGIAL AGTSWLNSEY 660 

NAKMNVANFQ FLFNMGVRMN LARPKKKDSD HAAQHGIELG LKIPTINTNY 720 

RRLYSVYLNY VFAY 744 



PRT 

Length : 744 
SequenceName : SEQ ID 
SequenceDe script ion : 



94 



Sequence 



<213> OrganistnName : Helicobacte] 
<40 0> PreSequenceString : 

50 MIKKAKKFIP FFLIGSLLAE DNGWYMSVGY 
IAGPTTGLIT LSSQTVIDAL GYGVSNTVGN 
IIGLKGSSDP LKAHSSQITA KLLSNTQSAF 
QNTAQSMAEL LQQIEHSITK TTSTTYAQSL 
GVFPTTTSTH WLNPPGQW FYPTNSLLGS 

55 NPNGCANQIQ CLEQFIQNLT PLAATPTSTN 
YNLNNLHNAL NFQAYQSTIE QYNNALKQIS 
ISAYDCTSAT GSLSSDASSG ISCSATSSTN 
LVSQVWSVYN SLKTSEENLQ KNAKILCNNG 
NGTTTNTQAK SNASKLKAMV MVNNEEEAKT 

60 NFQQSIQSAF QNQENNIQAW ANALYNTSNP 
INQQVPTDMN ALINQSQQTQ QTSGSASTTN 
ALGYQTQATT QNGSSGGSNI TYNVQQITLT 
INTAYQMLTD ASDGKLGTYN SSNSSNSSNS 
NAT TATTTTD SNLQKVYNDA QKIANIIASS 

65 GSSGSSSTCS GGLINLLGAI PTNGVS DTNN 
QAITSAISQG FQALQNDISP NAILTLLQEI 
IDAMINARNQ VQNAQNQANN YGSQPVLSQY 



: pylori J99 

QIGGTQQFIN NKQLLENQNI INSITQSAIN 60 

QLEGISNILN QIGKRKDFYS SRQISSISQQ 12 0 

DQGIALSSNI ISAVNSLNPS NNSQEVKAQL 180 

LSNLTDAVNA SSNNTTYVSA LVNALNTLGV 24 0 

TSSNSNNQQQ YNNTLLMNTL QGELSTNNQN 3 00 

QANQQVQAIA QKLQSVAINA LDNNAINNTT 360 

WISFSEPKNL LKNTSNNYQI GTVTNDQGQN 420 

NTNSFDNSLV ATSKVQTING KEQIGVNSFN 480 

SQSGTSPCNS SSGGLSISGN AQLQNILSPT 540 

TNFNQSSGPT TQSSNSTVMG ALNTVL QNVS 600 

NGNQSQNLTT NNNQDLRIQL RANFYQLINT 660 

NACASGMGSS GNWCYQQWSD SKAYYSGLQS 720 

SGGLLNQIIT NLKSVNGGSN GGSSGNGTSQ 780 

GNNNGYTPCN STNGSNGTSG SNCYEPNKQQ 840 

GNNKGVENGL KQFFEALKSN SSSLSNLCGN 90 0 

LINLLTEFIK TAGFIQNKDS NVSTSLTSAF 960 

TSNTTTIQSF SQTLRQLLGD KTFFMVQQKL 1020 

AAAKS TQHGM SNGLGVGIGY KYFFGKARKL 10 8 0 
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GLRHYFFFDY GFSEIGLANQ SVKANIFAYG VGTDFLWNLF RRTYNTKALN FGLFAGVQLG 1140 
GATWLSSLRQ QIIDNWGNAN DIHSTNFQVA LNFGVRTNFA EFKRFAKKFH NQGVISQKSV 1200 
EFGIKVPLIN QAYLNSAGAD VSYRRLYTFY INYIMGF 1237 
<212> Type : PRT 
5 <211> Leagth : 1237 

SequenceName : SEQ ID 95 

SequenceDescription : 

Sequence 

10 

<213> OrganismName : Helicobacter pylori J99 
<40 0> PreSequenceString : 

MKQNLKPFKM IKENLMTQSQ KVRFLAPLSL ALSLSFNPVG AEEDGGFMTF GYELGQWQQ 60 
VKNPGKIKAE ELAGLLNSTT TNNTNINIAS TGGNVAGTLG KfoFMNQXiGNL IDLYPTLKTN --J.20 

15 NLHQCGSTNS GNGATAAAAT NNSPCFQGNL ALYNEMVDSI KTLSQNISKN IFQGDNNTTS 18 0 

ANLSNQLSEL NTASVYLTYM NSFLNANNQA GGIFQNNTNQ AYENGVTAQQ IAYVLKQASI 240 
TMGPSGDSGA AGAFLDAALA QHVFNSANAG NDLSAKEFTS LVQNIVNNSQ NALTLANNAN 3 00 

ISNSTGYQVS YGGNIDQARS TQLLNNTTNT LAKVTALNNE LKANPWLGNF AAGNSSQVNA 3 60 

FNGFITKIGY KQFFGENKNV GLRYYGFFSY NGAGVGNGPT YNQVNLLTYG VGTDVLYNVF 420 

20 SRSFGSRSLN AGFFGGIQLA GDTYISTLRN SPQLASRPTA TKFQFL FDVG LRMNFGILKK 48 0 

DLKSHNQHSI EIGVQIPTIY NTYYKAGGAE VKYFRPYSVY WVYGYAF 527 
<212> Type : PRT 
<211> Length : 527 

SequenceName : SEQ ID 96 

25 SequenceDescription : 

Sequence 



<213> OrganismNarae : Helicobacter pylori J99 

30 <400> PreSequenceString : 

MKKTLLLSLS LSFGLHAEDD GFYASAGIRI GEAAQMVKNT KGIQQLSENY EKLNNLLNNY 60 

NTLNTLVKLS SDPSAVNDAR DNLGSSTRNL LDVKANS PAY QAVLLALNAA VGLWQVTSYA 12 0 

FTACGPGSNE NANGGIQTFN NVPGQNTTTI TCNSYYEPGH GGPISTKNYA IINKAYQIIQ 180 

KALTANGEGI PVLSNTTTKL DFTINGDKRT GGEPNKKLVY PWSHGKAIST S WNAT I TAP T 240 

35 TENINTTNSA QELLKQASII ITTLNSACPN FQNGGSGYWA GISGNGTMCG MFKNEISAIQ 3 00 

GM I ANAQEAV AQAKIVSENT QNQNS LDAGK PFNPYTDASF AES MLKNAQA QAEILNQAEQ 3 60 

WKNFEKIPT AFVNDSLGVC YEVQGGERRG TNP GQTTSNT WGAGCAYVGQ TITNLKNSIA 420 

HFGTQEQQIQ QAENIADTLV NFKSRYSELG NTYNSITTAL SNIPNAQSLQ NAVS KKNNP Y 480 

SPQGIDTNYY LNQNSYNQIQ TINQELGRNP FRKVGIVSSQ TNNGAMNGIG IQVGYKQFFG 540 

40 QKRKWGARYY GFFD YNHAF I KSSFFNSASD VWTYGFGADA LYNFINDKAT NFLGKNNKLS 60 0 

VGLFGGIALA GTSWLNSEYV NLATMNNVYN AKMNVANFQF LFNMGVRMNL ARPKKKDSDH 660 

AAQHGIELGL KIPTINTNYY SFMGAELKYR RLYSVYLNYV FAY 703 
<212> Type : PRT 
<211> Length : 703 

45 SequenceName : SEQ ID 97 

SequenceDescription : 



Sequence 



50 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MIKKNRTLFL SLALCASISY AEDDGGFFTV GYQLGQVMQD VQNPGGAKSD ELARELNADV 60 

TNNILNNNTG GNVAGAL SNA FSQYLYSLLG AYPTKLNGND VSANALLSGA VGSGTCAAAG 120 

TAGGTTLNTQ SACTAAGYYW LPSLTDRILS TIGSQTNYGT NTNFPNMQQQ LTYLNAGNVF 180 

55 FNAMNKALEK NGTATANSTS STSGATGSDG QTYSQQAIQY LQGQQNILNN AANLLKQDEL 240 

LLEAFNSAVA ANIGNKEFNS AAFTGLVQGI IDQSQLVYNE LTKNTISGSA VNNAGINSNQ 3 00 

ANAVQGRASQ LPNALYNVQV TLDKINALNN QVRSMPYLPQ FRAGNS RATN ILNGFYTKVG 3 60 

YKQFFGKKRN IGLRYYGFFS YNGASVGFRS TQNNVGLYTY GVGTDVLYNI FSRSYQNRSV 42 0 

DMGFFSGIQL AGETFQSTLR DDPNVKLHGK INNTHFQFLF DFGMRMNFGK LDGKSNRHNQ 480 

60 HTVEFGWVP TIYNTYYKSA GTTVKYFRPY SVYWSYGYSF 520 



<212> Type : PRT 

<211> Length : 520 

SequenceName : SEQ ID 98 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKKKFLSLTL GSLLVSALSA EDNGFPVSAG YQIGESAQMV KNTKGIQDLS DSYERLNNLL 60 

TNYSVLNALI RQSADPNAIN NARGNLNASA KNLINDKKNS PAYQAVLLAIi NAAAGLWQVM 120 

5 SYAISPCGPG KDTSKNGGVQ TFHNTPSNQW GGTTITCGTT GYEPGPYSIL STENYAKINK 180 

AYQIIQKAFG SSGKDIPALS DTNTELKFTI NKNNGNTNTN NNGEEIVTKN NAQVLL EQAS 240 

TIITTLNSAC PWINNGGAGG ASSGSLWEGI YLKGDGSACG IFKNEISAIQ DMIKNAAIAV 3 00 

EQSKIVAANA QNQRNLDTGK TFNPYKDANF AQSMFANAKA QAEILNRAQA WKDFERIPA 3 60 

EFVKDSLGVC HEVQNGHLRG TPSGTVTDNT WGAGCAYVGE TVTNLKDSIA HFGDQAERIH 42 0 

10 NARNLAYTLA NFSSQYQKLG EHYDSITAAI SSLPDAQSLQ NWSKKTNPN SPQGIQDNYY 480 

IDSNIHSQVQ SRSQELGSNP FRRAGLIAAS TTNNGAMNGI GFQVGYKQFF GKNKRWGARY 540 

YGFVDYNHTY NKSQFFNASS DVWTYGVGSD LLVNFINDKA TKHNKISFGA FGGIALAGTS 600 

WIiNS Q YVNLA NVNNYYKAKI NTANFQFLFN LGLRMNLARK KHRATDNAAQ HGIELGTKIP 660 

TINTNYYSLL-GTTLQYRRLY SVYLNYVFAY 690 

15 <212> Type : PRT 

<211> Length : 690 

SequenceNarae : SEQ ID 99 
SequenceDescription : 



20 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<4 00> PreSequenceString : 

MKIKKSLFAL SFSLMASLSR AEDDGFYMSV GYQIGEAVQK VKNTGALQNL ADRYDNLSNL 60 

25 LNQYNYLNSL VNLASTPSAI TGAIDNLSSS AINLTSATTT SPAYQAVALA LNAAVGMWQV 12 0 

IAFGISCGPG PNLGPEHLEN GGVRS FDNTP NYSYNTGSGT TTTTCNGASM VGPNGILSSS 18 0 

EYQVLNTAYQ TIQTALNQNQ GGGMPALNSS KNMWNINQT FTKNPTTEYT YPDGNGNYYS 240 

GGSSIPIQLK ISSVNDAENL LQQAATIINV LTTQNPHVNG GGGAWGFGGK TGNVMDIFGD 3 00 

SFNAINEMIK NAQAVLE KTQ QLNANENTQI TQPDNFNPYT SKDTQFAQEM LNRANAQAEI 3 60 

30 LSLAQQVADN FHSIQGPIQQ DLEE CTAGS A GVINDNTYGS GCAFVKETLN SLEQHTAYYG 420 

NQVNQDRALS QTILNFKEAL STLGNDSKAI NSGISNLPNA KSLQNMTHAT QNPNSPEGLL 480 

TYSLDTSKYN QLQTVAQELG KNPFRRIGVI NYQNNNGAMN GIGVQAGYKQ FFGKKRNWGL 540 

RYYGFFDYNH AYIKSNFFNS ASDWTYGVG MDALYNFIND KNTNFLGKNN KLSVGLFGGF 60 0 

ALAGTSWLNS QQVNLTMMNG IYNANVSASN FQFLFDLGLR MNLARPKKKD SDHAAQHGME 660 

35 LGVKIPTINT DYYSFMGAEL KYRRLYSVYL NYVFAY 696 



<212> Type PRT 

<211> Length : 696 

SequenceName : SEQ ID 100 
SequenceDe script ion : 

40 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

45 MKIKKSLFAL SFSLMASLSR AEDDGFYMSV GYQIGEAVQK VKNTGALQNL ADRYDNLSNL 60 

LNQYNYLNSL VNLASTPSAI TGAIDNLSSS AINLTSATTT SPAYQAVALA LNAAVGMWQV 12 0 

IAFGISCGPG PNLGPEHLEN GGVRSFDNTP NYSYNTGSGT TTTTCNGASN VGPNGILSSS 18 0 

EYQVLNTAYQ TIQTALNQNQ GGGMPALNSS KNMWNINQT FTKNPTTEYT YPDGNGNYYS 240 

GGSSIPIQLK ISSVNDAENL LQQAATIINV LTTQNPHVNG GGGAWGFGGK TGNVMDIFGD 30 0 

50 SFNAINEMIK NAQAVLEKTQ QLNANENTQI TQPDNFNPYT SKDTQFAQEM LNRANAQAE I 360 

LSLAQQVADN FHSIQGPIQQ DLEE CTAGS A GVINDNTYGS GCAFVKETLN SLEQHTAYYG 42 0 

NQVNQDRALS QTILNFKEAL STLGNDSKAI NSGISNLPNA KSLQNMTHAT QNPNSPEGLL 480 

TYSLDTSKYN QLQTVAQELG KNPFRRIGVI NYQNNNGAMN GIGVQAGYKQ FFGKKRNWGL 54 0 

RYYGFFDYNH AYIKSNFFNS ASDWTYGVG MDALYNFIND KNTNFLGKNN KLSVGLFGGF 600 

55 ALAGTSWLNS QQVNLTMMNG I YNANVSASN FQFLFDLGLR MNLARPKKKD SDHAAQHGME 660 

LGVKIPTINT DYYSFMGAEL KYRRLYSVYL NYVFAY 696 
<212> Type : PRT 
<211> Length : 696 

SequenceName : SEQ ID 101 



60 SequenceDe script ion : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
65 <400> PreSequenceString : 

MHKKVLLALT ASLICQESLF AKDKDYTLGK VSTAGKKDRS DYSGQVNLGY SGITAPKSWQ 60 
DEEVKKYTGS RTVI SNKALT QQANQSIEEA LQNVPGLQIR NATGVGAMPT IQIRGFGAGG 120 
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SGHSDATLML VNGIPVYMAP YAHIELDIFP VTFQAIDRID VI KGGGS VQY GPNTYGGIVN 180 

IITKPIPNQW ENQAAERITY WAKARNAGFA APPDKTGDPS FIKSLGNNLL YNTYVRSGGM 240 

INKHVGIQAQ ANWVRGQGFR DMSPSSISNY WLDGVYDINE SNGIKAYYQY YDFAIAQPGS 300 

LSEQDYKINR FANLRPLNQK GGRSQRFGAV YENRFGDLDR VGGTFS FT YY GQLMTRDFQV 3 60 

5 SSSYNSANMV TCFSEAACRA AGLPAGYNLA VPYYATNYNG WAEVENPVRS INNAFEPKVN 420 

LIVNTGKVRQ TFIMGLRFMT TTFLQRQYLN TNECATKTSG EGAGFLCEGP NVMSGWKPHI 48 0 

KHGVYRNWNN WRJflNYTAVYL SDRI EAWDGR FFIVPGLRYA FVQYNNENAS NWMQI PEKDL 540 

RKIKHMNNWM PSTNIGFIPV QGDHNVLTYF NYQRSFVPPQ LDVLSYGGAE YFTQHFDTVE 600 

AGARYTYKDK FSFNADYFRI WARD FATGQ Y SVYTSGPMKG NVRPINGYSQ GVELELYYRP 660 

10 IRGLQFHAAF NYIDTRVTSH GPLTDLNGDV LKGTS YNKHF PFVSPFQFIF DARYNWRKTT 720 

IGISSYFYSR AYSGI SNSAA GGYYGMQYYS GGNNYESVLN SGYQCEAWCM TQHEGLLPWY 780 

WVWNIQVSQI FWENGRHRVT GSLQINNIFN MKYYFTGIGS SPAGLQPAPG RSVTAYLNYT 840 

F 841 
<212> Type : PRT 

15 <211> Length : 841 

SequenceName : SEQ ID 102 
SequenceDescription : 



Sequence 

20 

<213 > OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKKTLLLSLS ASSLLNAEDN GFFISAGYQI GEAAQMVKNT GELKKLSDTY ENLSNLLTNF 60 
NNLNQAVTNA SSPSEINAAI DNLKANTQGL I GE KTNS PAY QAVYLALNAA VGLWNVIAYN 120 

25 VQCGPGNSGQ QSVTFEGQPG HNSSSINCNL TGYNNGVSGP LSIENFKKLN QAYQTIQQAL 180 
KQDSGFPVLD SAGKQVTITI TTQTNGANKS ETTTTTTTTN DAQTLLQEAS KM I SVLTTNC 240 
PWVNHNQGQN GGAPWGLDTA GNVCQVFATE FSAVTSMIKN AQEIVTQAQS LNQQNNQNAP 3 00 

QDFNPYTSAD RAFAQNMLNH AQAQAKILEL ADQMKKDLNT IPSQFITNYL AACHNGGGTL 3 60 

PDAGVTNNTW GAGCAYVEET ITALNNSLAH FGTQAEQIKQ SELLARTILD FRGSLSNLNN 42 0 

30 TYNSITTTAS NTPNSPFLKN LISQSTNPNN PGGLQAVYQV NQSAYSQLLS ATQELGHNPF 480 
RRVGLI S SQT NNGAMNGIGV QVGYKQFFGE KRRWGLRYYG F FD YNHAY I K SSFFNSASDV 540 
FTYGVGTDVL YMF I NDKTTK NSKISFGVFG GIALAGTSWL NSQYVNLATF NNFYSAKMNV 600 
ANFQFLFNLG LRMNLAKNKK KASDHAAQHG VELGVKIPTI NTNYYSLLGT QLQYRRLYSV 660 
YLNYVFAY 668 

35 <212> Type : PRT 

<211> Length : 668 

SequenceName : SEQ ID 103 
SequenceDescription : 

40 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MRKLFIPLLL FSALEANEKN GFFIEAGFET GLLEGTQTQE KRHTTTKNTY ATYNYLPTDT 60 

45 ILKRAANLFT NAEAISKLKF SSLSPVRVLY MYNGQLTIEN FLPYNLNNVK LSFTDAQGNT 120 

IDLGVIETIP KHSKIVLPGE AFDS LKEAFD KIDPYTLFLP KFEATSTSIS DTNTQRVFET 18 0 

LNNIKTNLIM KYSNENPNNF NTCPYNNNGN TKNDCWQNFT PQTAEEFTNL MLNMIAVLDS 24 0 

QSWGDAILNA PFEFTNSSTD CDSDPSKCVN PGVNGRVDTK VDQQYILNKQ GIINNFRKKI 3 00 

EIDAWLKNS GWGLANGYG NDGEYGTLGV EAYALD PKKL FGNDLKTINL EDIiRTILHEF 360 

50 SHTKGYGHNG NMTYQRVPVT KDGQVEKDSN GKPKDSDGLP YNVCSLYGGS NQPAFPSNYP 420 

NSIYHNCADV PAGFLGVTAA VWQQLINQNA LPINYANLGS QTNYNLNASL NTQDL A2STS ML 480 

STIQKTFVTS SVTNHHFSNA SQSFRSPILG VNAKIGYQNY FNDFIGLAYY GI I KYNYAKA 540 

VNQKVQQLSY GGGIDLLLDF ITTYSNKNSP TGIQTKRNFS SSFGIFGGLR GLYNSYYVLN 600 

KVKGSGNLDV ATGLNYRYKH SKYSVGISIP LIQRKASWS SGGDYTNSFV FNEGASHFKV 660 

55 FFNYGWVF 668 



<212> Type : PRT 

<211> Length : 668 

SequenceName : SEQ ID 104 
SequenceDescription : 

60 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 
65 MNKTTIKILM GMALLSSLQA AEAELDEKS K KPKFADRNTF YLGVGYQLSA INTSFSTSSI 60 
DKSYFMTGNG FGWLGGKFV AKTQAVEHVG FRYGLFYDQT FSSHKSYIST YGLEFSGLWD 120 
AFNSPKMFLG LEFGLGIAGA TYMPGGAMHG IIAQYLGKEN SLFQLLVKVG FRFGFFHNEI 180 
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TFGLKFPVIP NKKTEIVDGL SATTLWQRLP VAYFNYIYNF 220 
<212> Type : PRT 
<211> Length : 220 

SequenceName : SEQ ID 105 
5 , SequenceDe script ion : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 

10 <400> PreSequenceString : 

MKKTKKTILL SLTLAASLLH AEDNGVFLSV GYQIGEAVQK VKNADKVQKL SDVYEQLSKL 6 0 

LANDNGTSSK TSAQAINQAV NNLNESAKTL AGGTTNS PAY QATLLALRSA LGLWNSMGYA 12 0 

WCGGY I KKP GENNQKNFHY TDENGNGTTI NCGGS TNSNG THS PNGTNTL KADKMVSLSI 18 0 
* EQYEKIKEAY QILSKALKQA GLAPLNS KGE KLEAHVTTS K DQQGTSSDQT TTTTSVIDTT . -,240 

15 NDAQNLLTQA QTIVNTLKDY CPMLIAKSSS NGGTNGANTP SWQTAGGGKN SCATFGAEFS 3 00 

AISDMISNAQ KIVQETQQLN ANQPKNITQP NNFNLNSPGS LTALAQSMLK NAQSQTEILK 3 60 

LANQVASDFD KLSSGYLKDY IGKCDVSGVS SSNMTPQNMN TTWGKGCAGV EETLTSLKAS 42 0 

TTDFNNQTTP QLDQAQTLAN TLTQELGNNP FKRVGIIGSQ TNNGAMNGLG VQAGYKQFFG 480 

QKRRWGLRYY GFFDYNHTYI KSSFFNSSSD VLTYGVGSDL LFNFINDKNT MFLGKNNKIS 540 

20 VGLFGGIALA GTSWLNSQFV NLKTISNVYS AKVNTANFQF LFNLGLRTNL ARPKKKDSDH 600 

SAQHGMELGV KIPTINTNYY SYLGTKX.EYR RLYSVYIiNYV FAY 643 
<212> Type : PRT 
<211> Length : 643 

SequenceName : SEQ ID 106 

25 SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 

30 <400> PreSequenceString : 

MKKTILLSLM VSSLFAENDG VYMSVGYQIG EAAQMVKNTG EIQKVSNAYE NLNNLLTRYW 60 

ELKQTASNTD SSTAQAIDNL EKSASRLKTT PNTANQAVSS ALSSAVGMWQ VIASNLAtTNS 12 0 

LSSSEYKKLK ATSQLLQNTL ENKNNNLKIE NDYDQLLTQA STIINTLQSQ CPGVDGGNGK 180 

PWGINTSGNA CAIFGSTFNA IKTSMIDSAKK AAADARRTAP ESPNQQNAFT NADFNKNLNQ 24 0 

35 VSSVINDTIS YL KGDNLET I YNTIQKTPNS KGFQSLVSRS SYSYSLNETQ YSQFQTTTKE 3 00 

FGHNPFRSVG L INS QSNNGA MNGVGVQLGY KQFFGKNKFF GIRYYGFFDY NYAYI KSNFF 3 60 

NSASNVFTYG AGSDLLLNFI NGGSDRNRKV SFGIFGGIAL AGTTWLNNQS ANLKITNSAY 420 

SAKINNTNFQ FLFNTGLRLQ GIHHGIELGV KIPTINTNYY SFMGAKLAYR RLYSLYLNYV 480 

LAY 483 

40 <212> Type : PRT 

<211> Length : 483 

SequenceName : SEQ ID 107 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MPKASQVLFF GAFLSTSLQG FEAKLNGFVD QSSTIGFNQH KINKERGIYP MQQFAT I AGY 60 

50 LGLGFSLLPK KVSDHVLKGK IGGMVGSIFY DGTKKFEDGS VAYNLFGYYD GFMGVYTNIL 120 

QTDSLETQNM KHNKNVRNYV FSDAYLEYAY KNYFE I KAGR YLSTMPYKSG QTQGFQVSGQ 180 

YKHARLTWFS SWGRAFAYGS FLMDWFAART TYSGGFTKNN NGG YDS HGRK VLYGTHAVQL 24 0 

TYKPHRFLIE GFYYLSPQIF NAPGVKI GWD SNPNFSGTGF RSDTAIIGFF PIYYPWMIVK 300 

SNGS PVYRYD TPATQNGQNL IIRQRFDINN YNVSIAFYKV FQNANGWIGN MGNPSGVIMG 3 60 

55 SNSVYAGFTG TALKRDAATI FLS CGGTHFA KKFTWKFATQ YSNSWSWEA RAMISLGYKF 420 

TEYLSGSVDL AYYGVHTNKG FKPGENGPVP KNFPALYSDR SALYTALVAS F 471 



<212> Type : PRT 
<211> Length : 471 
60 SequenceName : SEQ ID 108 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MLRLVSKTIC LSLISLFNPL EAFQKHQKDV FFVEAGFETG LLEGAQTKEQ AIAQNTQNTQ 60 
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KIYENPLTHP QTKEQPKEQN KSDTATPQSV YGRYYILQNT ILEKATELFT AANINGNGLT 12 0 

FYSQNPVYVM AYNKDNAEFE GYGNNSVWI QNFLPYNLNN IELSYTDAQG KAVNLGVIET 18 0 

IPKDSQIILP ASLFNNFSND SPFNSDGLQQ LQTTTTPFSD ANTQSLFEKL SQITTNLQMT 24 0 

YENTDPFSSG NNDPNGPLAS PKPHYECPGY KKSCQVASVS FTPQTAEELT NLMLDMIAVF 3 00 

5 DSKSWEEAVL NAPFQFSNSP SECGIDYPKC VNPFNNGLVD PKDEKYALTP EEVINSYRVA 3 60 

NELTVNLLNA AKGFLGLGSQ LGSANAPDDD GFNQGVLGIA PFALDPEKLF GKNLNKVAIL 420 

ALRDIIHEYG HTLGYTHNGN MTYQRVRLCQ EGNGPEARCE GGHEVEKNGK EELEFSNGHE 48 0 

VRDHDGYTYD VCSRFGGKNQ PAFPSNYPNS I YTNCAQ VP A GL I GVTTAVW QQLINQNALP 540 

INFANLtTSQT SHLNAGLNAQ NFATSMVSAI AQNFSTTSTT TYRSSSKNFR SPILGVNVKI 60 0 

10 GYQHYFNDYI GLAYYGIIQY NYAQANDEKI QQLSYGGGMD VLFDFITTYT NKKQDHPTKK 660 

VFASSFGVFG GLRGLYNSYY VFNQVKGSGN LDIVTGFNYR YKHSKYSIGV^ SVPLIQSGIK 720 

IASNWGIYAD SWLNEGGS H FKVFFNYGWV F 751 
<212> Type : PRT 

<211> Length : 751 - .. ... 

15 SequenceNarae : SEQ ID 109 

SequenceDescription : 



Sequence 



20 <213> Organ! smName : Helicobacter pylori J99 
<400> PreSequenceString : 

MQNFVFNKKW LIYSSLLPLF FLNPLMAEDD GFFMGVSYQT SLAVQRVDNS GLNASQDAST 60 

YIRQNAIALE SAAVPLAYYL EAMGQQTRVL MQMLCPDPSK RCLLYAGGYQ NGQNNNGDTG 120 

NNPPRGNVNA TFDMQSLVNN LNKLTQLIGE TLIRNPENLP NSKVFNVKFG NQSTVIALPE 180 

25 GLANTMDALN NDITNALTTL WYNQTLTNKS FSTPSNTSVN FSPQVLQHLL QDGLATANNN 24 0 

QTICSTQNQC TATNEAKS I A QNAQNIFQAL MQAGILGGLA NEKQFGFTYN KAPNGSDSQQ 3 00 

GYQSFSGPGY YTKNDNTTQA PLKALPAGAT IGSGNGQYTY HPS S AVYYLA DSIIANGITA 3 60 

SMIFSGMQNF ANKAAKL I GT SSYNQMQDAI NYGESLLSNT VAYGDFITNW VAPYLDLNNK 42 0 

GLNFLPNYGG QLMGANNQTP QLTPQQAQQE QKVIMNQLEQ ATNAPTPAQI NRILANPYSP 4 80 

30 TAKTLMAYGL YRSKAVIGGV IDEMQTKVNQ VYQMGFARNF LEHNSNSNNM NGFGVKMGYK 540 

QFFGKKRMFG LRYYGFYDFG YAQFGTESSL VKATLSSYGA GTDFLYNVFT RKRGTEAIDI 600 

GFFAGIQLAG QTWKTNFLDQ VDGNHLKPKD TSFQFLFDLG IRTNFSKIAH QKRSRFSQGI 660 

EFGLKIPVLY HTYYQSEGVT AKYRRDFSFY VGYNIGF 697 
<212> Type : PRT 

35 <211> Length : 697 

SequenceName : SEQ ID 110 
SequenceDescription : 



Sequence 

40 — - 

<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKKTILLSLS LSLASSLLHA EDNGFFVSAG YQIGEAVQMV KNTGELKNLN EKYEQLSQYL 60 

NQVASLKQSI QNANNIELVN SSLNYLKSFT NNNYNSTTQS PIFNAVQAVI TSVLGFWSLY 120 

45 AGNYLTFFW NKDTQKPASV QGNPPFSTIV QNCSGI ENCA MNQTTYDKMK KLAEDLQAAQ 18 0 

QNATTKANNL CALSGCATTQ GQNPSSTVSN ALNLAQQLMD L I ANTKTAMM WKNIVIAGVS 240 

NVSGAIDSTG YPTQYAVFNN IKAMIPILQQ AVTLSQSNHT LSASLQAQAT GSQTNPKFAK 3 00 

D I YAFAQNQK QVISYAQDIF NLFSSIPKDQ YRYLEKAYLK IPNAGKTPTN PYRQEVNLNQ 360 

EIQTIQNNVS YYGNRVDAAL SVAKDVYNLK SNQTEIVTTY NNAKNLSQEI SKLPYNQVNT 420 

50 KDIITLPYDQ NAPAAGQYNY QINPEQQSNL SQALAAMSNN PFKKVGMISS QNNNGALNGL 480 

GVQVGYKQFF GESKRWGLRY YGFFDYNHGY IKSSFFNSSS DIWTYGGGSD LLVNFINDSI 540 

TRKNNKLSVG LFGGIQLAGT TWLNSQYMNL TAFNNPYSAK VNASNFQFLF NLGLRTNLAT 600 

AKKKDSERSA QHGVELGIKI PTINTNYYSF LGTKLEYRRL YSVYLNYVFA Y 651 

55 <212> Type : PRT 

<211> Length : 651 

SequenceName : SEQ ID 111 
SequenceDescription : 

60 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MLKLASKTIC LSLISSFTAV EAFQKHQKDG FFIEAGFETG LLQGTQTQEQ TIATTQEKPK 60 

65 PKPKPKPITP QSTYGKYYIS QSTILKNATE LFAEDNITNL TFYSQNPVYV TAYNQESAEE 120 

AGYGNNSLIM IQNFLPYNLN NIELSYTDDQ GNWSLGVIE TIPKQSQIIL PASLFNDPQL 180 

NADGFQQLQT NTTRFSDAST QNLFNKLSKV TTNLQMTYIN YNQFSSGNGS GSKPPCPPYE 240 
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NQANCVAKVP PFTSQDAKNL TNLMLNMMAV FDSKSWEDAV LNAPFQFSDN NLSAPCYSDY 3 00 

LTCVNPYNDG LVDPKLIAKN KGDEYNIENG QTGSVILTPQ DVIYSYRVAN NIYVNLLPTR 3 60 

GGDLGLGSQY GGPNGPGDDG TNFGALGILS PFLDPEILFG KELNKVAIMQ LRDIIHEYGH 42 0 

TLGYTHNGNM TYQRVRMCEE NNGPEERCQG GRIEQVDGKE VQVFDNGHEV RDTDGSTYDV 4 80 

5 CSRFKDKPYT AGSYPNSIYT DCSQVPAGLI GVT'SAVWQQL IDQNALPVDF TNIiSSQTNYL 540 

NASLNTQDFA TTMIiSAISQS LSSSKSSATT YRTSKTSRPF GAPLLGVNLK MGYQKYFNDY 60 0 

LGLSSYGIIK YNYAQANNEK IQQLSYGVGM DVLFDFITNY TNEKNPKSNL TKKVFTSSLG 660 

VFGGLRGLYN SYYLLNQYKG SGNLNVTGGL MYRYKHSKYS IGISVPLVQL KSRIVSSDGA 72 0 

YTNSITLNEG GSHFKVFFNY GWIF 744 
10 <212> Type : PRT 

<211> Length : 744 

SequenceName : SEQ ID 112 

SequenceDescription : 



15 Sequence 



<213> Organ! smName : Helicobacter pylori 0*99 
<40 0> PreSequenceString : 

MRKLFIPLLL FSALEANEKN GFFIEAGFET GLLEGTQTQE KRHTTTKNTY ATYNYLPTDT 60 

20 ILKRAANLFT NAEAISKLKF SSLSPVRVLY MYNGQLTIEN FLPYNLNNVK LSFTDAQGNV 12 0 

IDLGVIETIP RHSKIVLPGE AFDSLKIDPY TLFLPKIEAT STSISDANTQ RVFETLNKIK 18 0 

TNLWNYRNE NKFKDHENHW EAFTPQTAEE FTNLMLNMIA VLDSQSWGDA ILNAPFEFTN 240 

SPTDCDMDPS KCVNPGTNGL VNSKVDQKYV LNKQDIVNKF KNKADLDVIV LKDSGWGLG 3 00 

SDITPSNNDD GKHYGQLGW ASALDPKKLF GDNLKTINLE DLRTILHEFS HTKGYGHNGN 3 60 

25 MTYQRVPVTK DGQVEKDSNG KPKDSDGLPY NVCSLYGGSN QPAFPSNYPN SIYHNCADVP 42 0 

AGFLGVTAAV WQQLINQNAL PINYANLGSQ TNYNLNASLN TQDLANSMLS TIQKTFVTSS 480 

VTNHHFSNAS QSFRSPILGV NAKI GYQNYF NDFIGLAYYG 1 1 KYNYAKAV NQKVQQ^SYG 540 

GGIDLLLDFI TTYSNKNSPT GIQTKRNFSS SFGIFGGLRG LYNSYYVLNK VKGSGNLDVA 600 

TGLNYRYKHS KYSVGISIPL IQRKASWSS GGDYTNSFVF NEGASHFKVF FNYGWVF 657 



30 

<212> Type : PRT 

<211> Length : 657 

SequenceName : SEQ ID 113 
SequenceDescription : 

35 

Sequence 



<213> Organi smName r Helicobacter pylori J99 
<400> PreSequenceString : 

40 MSLATSYNVS NNFSKFNIKR VRGYLICLVC MTPKMIQRGL NGVSFYGCSD YVNKGDCKGV 60 
LREINGSMKM VCLHCENTPI MEKVESGRGG AYACKNCNRK FYFIDLAKQN ERKKDLEKEK 12 0 

KELLNKI EKQ KIKHLERFIL AGVKANIKEN SFFLGCKNYP KCEWTASMDS QDLKCPKCNR 180 
LMKRKKNFKN NEFFTATSLT LNAIEFCLYI NLKKKETNV 219 
<212> Type : PRT 

45 <211> Length : 219 

SequenceName : SEQ ID 114 
SequenceDescription : 

Sequence 

50 

<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

ME I KKYFL YA LFFLLFSGLF LSKLQAYKFN MSIVGKVSSY TKFGFNNQRY QPSKDIYPTG 60 
SYTSLLGELN LSMGLYKGLR AEVGAMMAAL PYDSTAYQGN NIPNGQPGSR TDPFGAGIFW 120 

55 QYIGWYAGHS GLNVQKPRLA MVHNAFLSYN YKKDKFSFGV KGGRYDAEEY DWFTSYTQGV 180 
EGFVKYKDTR LRVMYSDARA SASSDWFWYF GRYYTSGKAL MIADLKYEKD NLKINPYFYA 240 
IFQRMYAPGI NITYDTNPNF NNKGFRFVGT FVGFFPIFAT PANQNDIILF QQVPLGKSGQ 3 00 

TYFFRTRFYY NKWQFGGSVY KNIGNANGDI GIYGDPLGYN IWTNSIYDAE INNIVGADVI 3 60 

NGFLYVGSQY RGFSWKILGR WTDSPRADER SLALFLSYFS NKYNI RMDLK LEYYGNITKK 420 

60 GYCIGYCGMY VPVDPNGPGT QPLTHNVYSD RSHIMFNITY GFRIY 465 
<212> Type : PRT 
<211> Length : 465 

SequenceName : SEQ ID 115 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenc eSt ring : 

MKKTILLSLS LSLASSLLHA EDNGFFVSAG YQ I GEAVQMV KNTGELKNLN DKYEQLSQSL 60 
AQLASLKKSI QTANNIQAVN NALSDLKSFA SNNHTNKETS PIYNTAQAVI TSVLAFWSLY 120 
5 AGNALSFHVT GLNDGSNSPL GRIHRDGNCT GLQQCFMSKE TYDKMKTLAE NLQKAQGNLC 180 
ALSECSSNQS NGGKTSMTTA LQTAQQLMDL IEQTKVSMVW KNIVIAGVTN KPNGAGAITS 240 
TGHVTDYAVF NNIKAMLPIL QQALTLSQSN HTLSTQLQAR AMGSQTNREF AKDIYALAQN 3 00 

QKQILSNASS IFNLFNSIPK DQLKYLENAY LKVPHLGKTP TNPYRQNWL NKEINAVQDN 360 
VANYGNRLDS ALSVAKDVYN LKSNQTEIVT TYNDAKNLSE EISKLPYNQV NVTNIVMSPK 42 0 

10 DSTAGQYQIN PEQQSNLNQA LAAMSNNPFK KVGMISSQNN NGALNGLGVQ VGYKQFFGES 48 0 

KRWGLRYYGF FDYNHGYIKS SFFNSSSDIW TYGGGSDLIiV NFINDSITRK NNKLSVGLFG 540 
GIQLAGTTWL NSQYMNLTAF NNPYSAKVNA SNFQFLFNLG LRTNLATAKK KDSERSAQHG 60 0 

VELGIKIPTI NTNYYSFLGT KLEYRRLYSV YLNYVFAY 63 8 

<212> Type : PRT - 

15 <211> Length : 638 

SequenceName : SEQ ID 116 
SequenceDescription : 



Sequence 

20 

<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKLKKRKVAA TLLKRLTLPL LFTTGSLGAV TYEVHGDFIN FSKVGFNRSP INPVKGIYPT 60 
ETFVNLTGKL EGSVHLGRGW TVNVGGVLGG QVYDNTRYDR WAKDFTPPSY WDKTS CGTDS 12 0 

25 LSLCMNATKM WQQQGPGGI I DPRGIGYMYM GEWNGLFPNY YPANAYLPGH SRRYEVYKAN 18 0 

LTYDSDRVHM VMGRFDVTEQ EQMDWIYQLF QGFYGTFKLT KNMKFLLFSS WGRGIADGQW 240 
LFPIYREKPW GIHKAGIIYR PTKNLMIHPY VYLIPMVGTL PGAKIEYDTN PEFSGRGIRN 3 00 

KTTFYVLYDY RWNNAEYGRY APARYNTWDP FLDNGKWRGL QGPGGATLYL HHHIDINNYF 3 60 

WGGAYLNIG NPNMNLGTWG NPVALDGIEQ WVGGIYSLGF AGIDNITDAD AFTEYVKGGG 420 

30 KHGKFSWSVY QRFTTAPRAL EYGIGMYLDY QFSKHVKAGL KLWLEFQIR AGYNPGTGFL 480 
GPNGQPLNLN NGLFESSAFA QGPQNMGGIA KSITQDRSHL MTHISYSF 52 8 

<212> Type : PRT 
<211> Length : 528 

SequenceName : SEQ ID 117 

35 SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 

40 <400> PreSequenceString : 

MKNFSPLYCL KKLKKRHLIA LSLPLLSYAN GFKIQEQSLN GTALGSAYVA GARGADAS FY 60 

NPANMGFTND WGENRSEFEM TTTVINIPAF SFKVPTTNQG LYSVTSLEID KSQQNILGII 12 0 

NTIGLGNILK ALGNTAATNG LSQAINRVQG LMNLTNQKW TLASKPDTQI VNGWTGTTNF 180 

VLPKFFYKTR THNGFTFGGS FTAPSGLGMK WNGKGGEFLH DVFIMMVELA PSMSYTINKR 24 0 

45 FSVGVGLRGL YATGS FNNTV YVPLEGASVL SAEQILNLPN NVFADQVPSN MMTLLGNIGY 3 00 

QPALNCQ KAG GDMSDQSCQE FYNGLKKIMG YSGLIKASAN LYGTTQWQK SNGQGVSGGY 3 60 

RVGS SLRVFD HGMFSWYNS SVTFNMKGGL VAITELGPSL GSVLTKGSLN INVSLPQTLS 420 

LAYAHQF FKD RLRVEGVFER TFWSQGNKFL VTPDFANATY KGLSGTVASL DSETLKKMVG 4 80 

LANFKSVMNM GAGWRDTNTF RLGVTYMGKS LRLMGAIDYD QAPSPQDAIG IPDSNGYTVA 540 

50 FGTKYNFRGF DLGVAGS FT F KSNRSSLYQS PTIGQLRIFS ASLGYRW 587 



<212> Type : PRT 

<211> Length : 587 

SequenceName : SEQ ID 118 
SequenceDescription : 

55 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

60 MAFQVNTNIN AMNAHVQ SAL TQNALKTSLE RLSSGLRINK AADDASGMTV ADSLRSQASS 60 

LGQAIANTND GMGIIQVADK AMDEQLKILD TVKVKATQAA QDGQTTESRK AIQSDIVRLI 120 

QGLDNI GNTT TYNGQALLSG QFTNKEFQVG AYSNQSIKAS IGSTTSDKIG QVRIATGALI 180 

TASGDISLTF KQVDGVNDVT LESVKVSSSA GTGIGVLAEV INKNSNRTGV KAYASVITTS 240 

DVAVQSGSLS NLTLNGIHLG NIADIKKNDS DGRLVAAINA VTSETGVEAY TDQKGRLNLR 3 00 

65 SIDGRGIEIK TDSVSNGPSA LTMVNGGQDL TKGSTNYGRL SLTRLDAKSI NWSASDSQH 360 

LGFTAIGFGE SQVAETTVNL RDVTGNFNAN VKSASGANYN AVIASGNQSL GSGVTTLRGA 420 

MWIDIAESA MKMLDKVRSD LGSVQNQMIS TVNNISITQV NVKAAESQIR DVDFAEESAN 480 
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FNKNNILAQS GSYAMSQANT VQQNILRLLT 510 
<212> Type : PRT 
<211> Length : 510 

SequenceName : SEQ ID 119 
5 SequenceDescription : 



Sequence 



<213> OrganismName : Helicobacter pylori J99 

10 <400> PreSequenceString : 

MAGTQAIYES SSAGFLSQVS SIISSTSGVA GPFAGIVAGA MTAAIIPIW GFTNPQMTAI 60 
MTQYNQSIAE AVSVPMKAAN QQYNQLYQGF NDQSMAVGNN ILNISKLTGE FNAQGNTQSA 12 0 

QISAVNSQIA SILASNTTPK NPSAIEAYAT NQIAVPSVPT TVEMMSGILG MITSAAPKYA 18 0 

LALQEQLRSQ ASNSSMNDTA DSLDSCTALG ALVGSSKVFF SCMQISMTPM SVSMPTVYAK 24 0 

15 YQAVATKALT SGVNPMTTPA CPIGDKVLAV YCYAEKVAEI LREYYIEFVK NNTNLLQNAS 3 00 

QMILNQSGIiA TSTYDTQAIS NISSLYNYNI VANKSFLKSH LTYLDYIKDK LKGQKDSYLT 3 60 

ERVQTKIIVK 3 70 

<212> Type : PRT 
<211> Length : 370 

20 SequenceName : SEQ ID 120 

SequenceDescription : 



Sequence 



25 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MTNEAINQQP QTEAAFNPQQ FINNLQVAFI KVDNWASFD PMQKPIVDKN DRDNRQAFEK 60 

ISQLREEFAN KAIKNPTKKN QYFSSFISKS NDLIDKDNLI DTGSSIKSFQ KFGTQRYQIF 12 0 

MNWVSHQNDP SKINTQKIRG FMENIIQPPI SDDKEKAEFL RSAKQAFAGI IIGNQIRSDQ 180 

30 KFMGVFDESL KERQEAEKNG EPNGDPTGGD WLDIFLSFVF NKKQSSDLKE TLNQEPVPHV 240 

QPDVATTTTD IQSLPPEARD LLDERGNFSK FTLGDMNMLD VEGVAD ID PN YKFNQLLIHN 3 00 

NALSSVLMGS H2TGI EPEKVS LLYGNNGGPE ARHDWNATVG YKNQRGDNVA TLINVHMKNG 3 60 

SGLVIAGGEK GINNPSFYLY KEDQLTGSQR ALSQEEIQNK VDFME FLAQN NAKLDMLSKK 42 0 

EKEKFQNEIE DFQKDSKAYL DALGNDHIAF VSKKDKKHLA LVAEFGNGEL SYTLKDYGKK 480 

35 ADKALDREAK TTLQGSLKHD GVMFVDYSNF KYTNASKSPD KGVGATNGVS HLEAGFSKVA 540 

VFNLPNLNNL AITSWRQDL EDKLIAKGLS PQEANKLVKD FLSSNKELVG KALNFNKAVA 600 

EAKNTGNYDE VKQAQKDLEK SLKKRERLEK DVAKNLESKS GNKNKMEAKS QANSQKDEIF 660 

ALINKEANRD ARAIAYAQNL KGIKRELSDK LENINKDLKD FSKSFDEFKN GKNKDFSKAE 720 

ETLKALKGSV KDLGINPEWI SKVENLNAAL NEFKNGKNKD FSKVTQAKSD LENSIKDVII 78 0 

40 NQKITDKVDN LNQAVSVAKA TGDFSGVEQA LADLKNFSKE QLAQQAQKNE DFNTGKNSAL 840 

YQSVKNGVNG TLVGNGLSKA EATTLSKNFS DIKKELNAKL GNFNWNIJNNG LENSTEPIYT 900 

QVAKKVKAKI DRLDQIASGL GDVGQAASFL LKRHDKVDDL SKVGLSANHE PIYATIDDLG 960 

GPF PLKRHDK VDDLSKVGLS REQKLTQKID NLNQAVSEAK ASHFDNLDQM IDKLKDSTKK 102 0 

NWNLYVESA KKVPTSLSAK LDNYATNSHT RINSNVKNGT INEKATGMLT QKNSEWLKLV 1080 

45 NDKIVAHNVG SAPLSAYDKI GFNQKNMKDY SDSFKFSTRL SNAVKDIKSG FVQFLTNIFS 1140 

MGSYSLMKAS VEHGVKNTNT KGGFQKS 1167 
<212> Type : PRT 
<211> Length : 1167 

SequenceName : SEQ ID 121 

50 SequenceDescription : 



Sequence 



<213> OrganismName : Helicobacter pylori J99 

55 <400> PreSequenceString : 

MKTNGHFKDF AWKKCFLGAS WALLVGCSP HIIETNEVAL KLNYHPASEK VQALDEKILL 60 
LRPAFQYSDN IAKEYENKFK NQTTLKVEEI LQNQGYKVIN VDSSDKDDFS FAQ KKEGYLA 120 
VAMNGE I VLR PDPKRTIQKK SEPGLLFSTG LDKMEGVLIP AGFVKVTILE PMSGESLDSF 180 
TMDLSELDIQ EKFLKTTHSS HSGGLVSTMV KGTDNSNDAI KSALNKIFAS IMQEMDKKLT 240 

60 QRNLESYQKD AKELKNKRNR 260 
<212> Type : PRT 
<211> Length : 260 

SequenceName : SEQ ID 122 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Mycoplasma j 
<400> PreSequenc eSt ring : 
MKSKLKLKRY LLFLPLLPLG TLSLANTYLL 
SYELVDWKRV GDTKLVALVR SALVRVKFQD 
5 GSSDTSGSNS QDFASYVLIF KAAPRATWVF 
KTLQDLLVEQ PVTPYTPNAG BARVNGVAQD 
TGFKLDKGRA YRKLNESWPV YEPLDSTKEG 
AGSASSLQGN GSNSSGLKSL LRSAPVSVPP 
AAVSVNRTAS DTATFSKYLN TAQALHQMGV 

10 AAGASQTGLG TGSPREPALT ATS QRAVTW 
AYLNGQIWM GSDRVPSLWY WWGEDQESG 
SNSDSKNSWL KAQGLTQPAY LIAGLDWAD 
WSTTAGLDSD GGYKALVENT AGLNGPINGL 
PVKSDQKATA KIASLINASP LNSYGDDGVT 

15 VNESELKSAR ENAQSTSDDN SNTKVKWTNT 
KISTLESQAT DGFANSLLNF GTGLKAGVDP 
DKLLDSTDKN SEPISFSYTP FGSAESAVDL 
QQQVKNRKGY AVITVSRTGI EFNEDANTTT 
FSAVITKDQT WTGKVDIYKN TNGLFEKDDQ 

20 ANSVLQARNL TDKTVDEVIN NPDILQSFFK 
LDGNFYGEDS KIAGIPLNID FPSRIFAGFA 
PMYKVRKLQD SSFVDVFKKV DTLTTAVGSV 
KPAAPTAPRP PVQPPKKA 
<212> Type : PRT 

25 <211> Length : 1218 

SequenceNarae : SEQ ID 123 
SequenceDescription : 



QDHNTLTPYT PFTTPLNGGL DWRAAHLHP 60 

TTSSDQSNTN QNALSFDTQE SQKALNGSQS 12 0 

ERKIKLALPY VKQESQGSGD QGSNGKGSLY 180 

TVHFGSGQES SWNSQRSQKG LKNUPGPKAV 240 

KGKDESSWKN SEKTTAENDA PLVGMVGSGA 3 00 

SSTSNQTLSL SNPAPVGPQA WSQPAGGAT 3 60 

IVPGLEKWGG NNGTGWASR QDATSTNLPH 42 0 

AGPLRAGJSTSS ETDALPNVIT QLYHTSTAQL 4 80 

KATWWAKTEL NWGTDKQKQF VENQLGFKDD 540 

HLVFAAFKAG AVGYDMTTDS SASTYNQALA 60 0 

FTLLDTFAYV TPVSGMKGGS QNNEEVQTTY 660 

VFDALGLNFN FKLNEERIiPS RTDQLLVYGI. 720 

ASHYLPVPYY YSANFPEAGN RRRAEQRNGV 78 0 

APVARGHKPN YSAVLLVRGG WRLNFNPDT 840 

TTLKDVTYIA ESGLWFYTFD NGEKPTYDGK 900 

LSQAPAALAV QNGIASSQDD LTGILPLSDE 960 

LSENVKRRDN GLVPIYNEGI VDIWGRVDFA 1020 

FTPAFDNQRA MLVGE KTSDT TLTVKPKIEY 1080 

ALPSWVIPVS VGSSVGILLI LLILGLGIGI 1140 

YKKIITQTSV IKKAPSALKA ANNAAPKAPV 1200 

1218 



Sequence 

30 

<213> OrganismName : Mycoplasma pneumoniae 
<400> P resequences t ring : 

MHQTKKTALS KSTWILILTA TASLATGLTV VGHFTSTTTT LKRQQFSYTR PDEVALRHTN 60 

AINPRLTPWT YRNTSFSSLP LTGENPGAWA LVRDNSAKGI TAGSGSQQTT YDPTRTEAAL 120 

35 TASTTFALRR YDLAGRALYD LDFSKLNPQT PTRDQTGQIT FNPFGGFGLS GAAPQQWNEV 180 

KNKVPVEVAQ DPSNPYRFAV LLVPRSWYY EQLQRGLGLP QQRTESGQNT STTGAMFGLK 240 

VKNAEADTAK SNEKLQGAEA TGSSTTSGSG QSTQRGGSSG DTKVKAIiKIE VKKKSDSEDN 3 00 

GQLQLEKNDL ANAPIKRSER SGQSVQLKAD DFGTALSSSG SGGNSNPGSP TPWRPWLATE 360 

QIHKDLPKWS ASILILYDAP YARNRTAIDR VDHLDPKAMT ANYPPSWRTP KWNHHGLWDW 420 

40 KARDVLLQTT GFFNPRRHPE WFDGGQTVAD NEKTGFDVDN SENTKQGFQK EAD SDKS API 480 

ALPFEAYFAN IGNLTWFGQA LLVFGGNGHV TKSAHTAPLS IGVFRVRYNA TGTSATVTGW 54 0 

PYALLFSGMV NKQTDGLKDL PFNNNRWFEY VPRMAVAGAK FVGRELVLAG TITMGDTATV 600 

PRLLYDELES NLNLVAQGQG LLREDLQLFT PYGWANRPDL PIGAWSSSSS SSHNAPYYFH 660 

NNPDWQDRPI QNWDAFIKP WEDKNGKDDA KYIYPYRYSG MWAWQVYNWS NKLTDQPLSA 720 

45 DFVNENAYQP NSLFAAILNP ELLAALPDKV KYGKENEFAA NEYERFNQKL TVAPTQGTNW 78 0 

SHFSPTLSRF STGFNLVGSV LDQVLDYVPW IGNGYRYGNN HRGVDDITAP QTSAGSSSGI 840 

STNTSGSRSF LPTFSNIGVG LKANVQATLG GSQTMITGGS PRRTLDQANL QLWTGAGWRN 900 

DKASSGQSDE NHTKFTSATG MDQQGQSGTS AGNPDSLKQD NISKSGDSLT TQDGNAIDQQ 960 

EATNYTNLPP NLTPTADWPN ALSFTNKNNA QRAQLFLRGL LGSIPVLVNR SGSDSNKFQA 1020 

50 TDQKWSYTDL HSDQTKLNLP AYGEVNGLLN PALVETYFGN TRAGGS GS NT TSSPGIGFKI 108 0 

PEQNNDSKAT LITPGLAWTP QDVGNLWSG TTVSFQLGGW LVTFTDFVKP RAGYLGLQLT 1140 

GLDASDATQR ALIWAPRPWA AFRGSWVNRL GRVESVWDLK GVWADQAQSD SQGSTTTATR 12 00 

NALPEHPNAL AFQVSWEAS AYKPNTSSGQ TQSTNSSPYL HLVKPKKVTQ SDKLDDDLKN 12 60 

LLDPNQVRTK LRQSFGTDHS TQPQPQSLKT TTPVFGTSSG NLSSVLSGGG AGGGSSGSGQ 132 0 

55 SGVDLSPVEK VSGWLVGQLP STSDGNTSST NNLAPNTNTG NDWGVGRLS ESNAAKMNDD 13 80 

VDGIVRTPLA ELLDGEGQTA DTGPQSVKFK SPDQIDFNRL FTHPVTDLFD PVTMLVYDQY 1440 

IPLFIDIPAS VNPKMVRLKV LSFDTNEQSL GLRLEFFKPD QDTQPNMNVQ VNPNNGDFLP 1500 

LLTASSQGPQ TLFSPFNQWP DYVLPLAITV PIWIVLSVT LGLAIGIPMH KNKQALKAGF 1560 

ALSNQKVDVL TKAVGSVFKE IINRTGISQA PKRLKQTSAA KPGAPRPPVP PKPGAPKPPV 1620 

60 QPPKKPA 1627 
<212> Type : PRT 
<211> Length : 1627 

SequenceMame : SEQ ID 124 
SequenceDe script ion : 

65 

Sequence 
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<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 
MGYKLKRWPL VAFTFTGIGL GWLAACSAL 
KAALAEDNGT ETILRVNFGE ALKSWYQNNK 
5 PINWPIELQK EYDQWGGSES SWKALKLYDR 
KDNLDSTSNK IKFVNSKPND PNGEFFANLQ 
DSLYDQAAIG SALQLGYAFP AFREPNNGQS 
QTQQSNTSSR TGLFDWQTKW NTNGAANKLL 
KTSSLPEVKV DSNKSNQNPL DSFFMEGKDA 

10 KQSQQNDTFY QNQRKLSGGQ SGDNNSQGKH 
GGSSGGNSVL IPLPRSAALT HTQQQVQQTT 
KRDFTKQADI LLYRYLQAKS NNFKENGVEF 
MGSGQGTQVK GSVQGSSRAA SVSVQTTQQN 
TDNPTFKKAL TOIQSEYKDY LAAAGKLSEF 

15 SAIGLGQPLP YQRASDGSYP ALEKFFIPED 
TVKQPDIKPT RENNDKKLKQ LTSDVETKAS 
TNMILALLSD VGIKWTKILN SFKEWFFTNT 
YLRSWQRLTS KEKFGYYKEL GSVKAQAAQS 
LGKKAFESEL EASSSDGQYK YLRFLSTLMW 

20 YDDTATASAA AAKAQVAVLK TAQATNTQSD 
DSLLESESTY NFTAEPFDDK TKSQKRSTGG 
QIFNNFGQLV TS SDKS GALS QYKDKATLKR 
RFNSSGEPLI SFDNKKKFLV DWDKLDDVY 
PKPHHSPRTR VSRLWAMSFR LPTRTLTKFL 

25 <212> Type : PRT 

<211> Length : 1300 

SequenceName : SEQ ID 125 
SequenceDescription : 

30 Sequence 



<213> OrganismName i Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MKKLLIKPQF WFLTLGGFIS SSVILVACAT PSNSALQTVF KARSNQFFNG EQGSLQNALA 60 
35 TALKDPEANK QFVAAPLLKA LTAWYENNQD KQVTQFFKDT KKSVDEQYNQ AVDKWSASR 120 

NKNLFVQQDL LDSAGGVRNL KS PEWWTAH 150 

<212> Type : PRT 

<211> Length : 150 

SequenceName : SEQ ID 126 
40 SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 

45 <400> PreSequenceString : 

MQQQGETKDQ YNTFGLRLVR NSVGVSVLGL 
KFLKFRAFQA KIGTFYNTNF AFSFPLNETL 
LVDAFSSYKN WLSEYTPVGL ATTMISFYFD 
GLSAKLPYVN TNGNYEKLNN YFTFLITKVL 

50 KILNNIDSKL KTFVQKLKPT LAPRPAYSNV 
LSFMLLKQMF DQNSLFKKAK TLFENIQNKA 
WAKLTDKSIY GNLKDDKFDD LFKLAFDSSI 
KNFKDLLKAN LKFGEIAFIA YKNTETQNFS 
FFYKTTTKPE AKTTQSANTA VMVQNTQMNN 

55 ITKTSLQQYG SQADLKKIIG ETKNQLLLDR 
VGNPTLDFKA KQKLLLDVLD QYKDFFGNNA 
SYKDIDGLSL SSSNGTSSKF ASDWAALLL 
GIDLLK 

<212> Type : PRT 
60 <211> Length : 726 

SequenceName : SEQ ID 127 
SequenceDescription : 

Sequence 

65 

<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 



NTSNLFPRQN RSKQLIGFTE 
DRNIATRLTI FSENVEDEHD 
LIADFQSLIF SNIVANVQLT 
AYLFAQWWE ENPLPLTQAF 
QGKTTFDPTP NSAQNFGDFI 
VTKSNLRGAF KGVGLATA I I 
VAIRSIVSRA KIAMTDQTPG 
HYLQDAVRLT SSQAMAAAST 
STLQTPVYAR GDDGTYALAI 
SLNLLESGSL FQTWAQTGLT 
RQQSTDTQES EWKLAKSLL 
KKDLGEVSGL jQQAIIDRADK 
SAADGKVKAS ESGSAALVTL 
SLITKWGATP QIGSQFSEIV 
NDFKNNYDSE KKELKGNEYK 
GMVSLSSSAA VANAVASSGM 
LVKDGAKNYK RLLQQAITVG 
NPFNKFVQNP DYVQGSETNW 
TTNEKHFFGF NGLTINSPQS 
LIQNTNSDAE LNAFGEVLHR 
FNKFEGYVGQ TKVKMSDSSS 
LVEKL I RTVL 



NNIIKPEAVL 60 

NLLDQKQQAE 120 

DGSDQFKPTT 180 

FAYQAPKDGL 240 

KAVFPEQKNG 3 00 

DQYEYLVGGS 3 60 

FKVNPAFVKV 420 

GADSSSGTNV 480 

DGGDYFLANN 540 

AKLYGALVAM 600 

KSSADLAKPF 660 

YIQLEKQAQK. 720 

KTTDSQKSTN 780 

SLKSKDNKPQ 840 

DFNDLVKQTL 90 0 

QKS GDQTLLE 960 

TRAFVSWTVS 102 0 

FNDKSTPIKP 1080 

VSTASAGLTE 1140 

AVNVDTSNLG 12 00 

SSQGTKTIRK 12 60 
1300 



DGFVKFIKGG SGGSNGGSSS 
KGW FDKHRGL ILANALVKVT 
QMKALNNKLL ERVRS LNQNV 
WPKVGTEDTN VSEEKSKLKT 
ILLNINNDKV WSAGANWSLA 
KTSGSGKSGT TTNDDADALS 
NEKSFNVDYK AVIEHYRFIY 
NPQGIFGSYF NYENETNAAK 
QQTNSYGFTG LSTSSGSMLG 
IANQLIALKP NTSGNSGTQK 
QAVQRDSGKS GTGNYLTYTD 
FQAAYKGTQQ LALSSINKPQ 



AKKIDKEEQK 60 

LDTKEKASKA 120 

NQANPTPWLN 180 

KTEDVNKIRE 240 

VLLDPKKVNP 3 00 

KVIGNYYYNT 360 

TLEWLVDKNL 42 0 

SATQIIDPNS 480 

AATQQAILDQ 540 

TIAAYFQTDA 600 

GSDKITYLQF 660 

LPIGDKRIKT 720 
726 
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MKSFLRKPKF WLLLLGGLST SSIILSACAT 
TALRDPETNK KFVAAPLLKA LEAWYENNQD 
NKSLFVQQDL LDSSGGSEAT WKARKLFEQL 
TISKNSNWQN IVFDAVNFPE TNDDFFAKIQ 



PSNSALQAVF KPTSNQFFNG EHGTIQSALN 
KNITQFLKDT KTNVDNQYKT WDKWSAPR 
ISDFASRVFQ KNYLSYKENG KVSAGPFLYD 
SEVFDQWAEY TDPTIISSVT LKYSAPN 



60 
120 
180 
237 



10 



15 



20 



25 



30 



35 



-40 



45 



50 



55 



60 



<212> Type : PRT 

<211> Length : 237 

SequenceNarne : SEQ ID 128 
SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 

<40Q> PreSequenceString : 

MINFLFNQMN ALNNKFLERA KALNQNVNQA NPTPWLNGLS AKLPYVRTNG NYEKLNNYFT 60 

FLIVKYMWKK VGNEDASLSK DSSINKLKTK TEDVNKIRDK ILEDIQKKVQ EFVKNKLKPT 120 

LAPRQTYSNV ILLNVNNDKV WSMGANWALA NLLDTSKINP LSFMLLKQTF DQNDLFKKAK 18 0 

KLFEDIQSKT NGGSSGGMQG SNTSSSEGAD ALSKVIGNYY YNSWAKLTDK SIYGNPKDNK 240 

FDDLFKLAFE DSINEKSFNV DYKAVIEHYR FIYTLEWLVN GNLKNFKDLL KANLKFGEIA 3 00 

FIAYKNTETK EFSNPQGVFG SAFNYENETN EVKIAAQNLD PNNFFYKTTT KPEEVKTAQM 360 

GASMMVMQQK MQSTMQDSNH YGFTGLNTST SSMLGAATQQ AILDQITKNS LQQYGSQQEL 42 0 

RTLIEKTNNQ LLLDRIASQL SGLNPSTTGN SNNGKGKNIA TYFQLDAIGN PTLSFQQKRK 480 

LLLDVLDQYK DFFGTNTQAA QRDSGKGGHG SYSTYQDGSD KITYLQFSYK DIDNLSLSDK 540 

GNSKLASDW AALLLFQAAD KGTQQLALSA IN 572 
<212> Type : PRT 
<211> Length : 572 

SequenceNarne : SEQ ID 129 

SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MKKFLRKPQF WKLTLGGFLS TSVILAACAT PSNSALQTVF KARSSQFFNG EQGSLQSALT 60 

TALKNPVANK QFIAAPLLKA LEAWYENNED KKITQFLKDT KSNVDSQYTT AVDKWSASR 120 

NKSLFVQQDIi LDNAGGS EAT WKAQKLLEQL ISDFASRVFQ KNYLNYKKDG QVSTGPFTYD 180 

ELHKEESWKN FEFSAPRFSE TNDDFFAKIQ SQVFDQWVEY TDPTLISQVN YKYSAPSQGL 240 

GQ I YNREKLK DKLTPSYAFP FFAEEKDIAP NQNVGNKRWK QLVKGEGAIT DNNIGQSGTN 3 00 

SQKTGLLKYR NESNKGDFLD FPLNLSDTNE TKQLVDASNI VDQLEAANLG AALNLKLQVF 3 60 

EQDNDELPQI KELKEDLNNT IWDKSKDVE KASKTNALFY NDQEGKQQQS DSDPIAGALD 420 

DIFAQNTSEG TNLSKLAEQV KKAAATKMEA KTAVLRTNNS KGQQNNYWL DAAIPTFNST 480 

TSKSKNNSAS NEVLVALKSG SINLRQVQQT DQNSYSPIKF RIVRNSTGVT VFGLDGGSYY 540 

LKQD STNKKS VSKQSLTLLT KSSSGNSNKV LRDLDKQKQF LKFRAFQAKT NTFYSTNFAF 600 

SFPLNETLKS WFDKHRELIL ANALVNASLD QKDKASKALT EAFNPYKELI KEFAPVALAT 660 

TMISFYFDQM KALNNKLLER ARNLNQNVNQ ANPTPWLNGL SAKLPYVNTN GNYEKLNNYF 720 

TFLITKTLWP KVGQEETSIS EESNKLKTKT ADVDKI RDKI LENIQTKVND FVKNKLKPAL 78 0 

APRPAYSNVI LLNVNNDKVL SSGANWSLAS LLQSDKVNPL SFMLLKQAFD NNDLFKKAQK 840 

LFKDIQEKSS NNGGMQSSST TNSDADALSK VIGNYYYTTW AKLTDKS I YG NPKDNKFDEL 900 

FKLAFEASID EKSFNVDYKA VIDHYRFIYT LQWLVDQKLK NFKSLLKTNL KFGEVAF I A Y 960 

KNTETTNFSN PQGVFGSYFN YENS AS EVKE STQTLDPNNF FYKTTTKPTV QAIQQVASLA 1020 

LVQKQQMQQN STDHYGFTGL STSTSSMFDA SSRDAILQQI TKTSLQQYGS KDQLKKIIQG 10 80 

TNNQLLLDRI AVQLSGLNPS TTNGGSGKTI ATYFQVDAVG NPTLDFQAKR KLLLDLLDQY 1140 

QNYFGNGAQK SQRDSTPSGT GNYLTYQNGS DKYTYTQFTY QDIDSLSLTT TSGTNNKIAS 12 00 

DWAALLLFQ AADKGTQ QLA LSAINKPQLN IGDKRIESGL KLLK 1244 
<212> Type : PRT 
<211> Length : 1244 

SequenceNarne : SEQ ID 130 

SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MVGSGAAGSA SSLQGNGSNS SGLKSLLRSA PVSVPPSSTS NQTLSLSNPA PVGPQAWSQ 60 

PAGGATAAVS VNRTASD TAT FSKYLNTAQA LHQMGVIVPG LEKWGGNNGT G WAS RRD AT 120 

STNLPHAAGA SQTGLGTGSP REPALTATSQ RAVTWAGPL RAGNSSETDA LPNVITQLYH 180 

TSTAQLAYLN GQIWMSSAR VPSLWYWWG EDQESGKATW WAKTELNWGT DKQKQFVENQ 240 
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LGFKDDSNSD SKNSNLKTQG LTQPAYL I AG LDWADHLVF AAFKAGAVGY DMTTDSNAST 3 00 

YNQALWSTT AGLDSDGGTR LW 3 22 

<212> Type : PRT 
<211> Length : 322 
5 SequenceName : SEQ ID 131 

SequenceDescription : 

Sequence 



10 <213> Organ ismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MPVFLKLTHT IRKVLRVARL SRLAIiLSLTA VIFSGCANIN LISAVGSSSV QPLLSKLSSH 60 

YVLNHNDKDN LVEISVQAGG SSAGVKAITK GLADIGNVSK NTKSYAEENK QLWMDKKLKT 120 

X T.LGKDAI AV IYKAPSEFKG KLVLTKDNLN DLYDLFAGSK SVDINKFVEN GQTXKNSNHN - ^18.0 

15 LIGFPRTGGA FASGTAEAFL KFSGLTQTKT LDKDSKEILE GQRNYGPNAR PTSETNIEAF 240 

NT FVTTLRQ P NLYGMVYLSL GFVNNNMNLI KSEGFEVLKV KYDNNAVTPS SQAVSSNTYK 3 00 

WVRPLNSWS LLPKQKTLPS IQRFFNWLLF SNNSEIKKIY DDFGVLELTA DEKKKMF KTG 3 60 

NAEMSNIANF WVDDYSLNNQ TFGAL 385 
<212> Type : PRT 
20 <211> Length : 385 

SequenceName : SEQ ID 132 

SequenceDescription : 

Sequence 
25 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MSFAVLPPEI NSARLYVGAG LAPMLDAAAA WDGLADELGS AAASFSAVTA GLAGS S WLGA 60 

AS TAMTGAAA PYLGWLSAAA AQAQQAATQT RLAAAAFEAA LAATVHPAII SANRALFVSL 12 0 

30 WSNLLGQNA PAIAATEAAY EQMWAQDVAA MFGYHAGASA AVSALTPFGQ ALPTVAGGGA 180 

LVSAAAAQVT TRVFRNLGLA NVGEGNVGNG NVGNFNLGSA NIGNGNIGSG NIGSSNIGFG 240 

NVGPGLTAAL NNIGFGNTGS NNI GFGNTGS NNIGFGNTGD GNRGIGLTGS GLLGFGGLNS 3 00 

GTGNIGLFNS GTGNVGI GNS GTGNWGI GNS GNS YNTGFGN SGDANTGFFN SGIANTGVGN 360 

AGNYNTGSYN PGNSNTGGFN MGQYNTGYLN SGNYNTGLAN S GNVNTGAF I TGNFNNGFLW 42 0 

35 RGDHQGL I FG SPGFFNSTSA PSSGFFNSGA GSASGFLNSG ANNSGFFNSS SGAIGNSGLA 480 

NAGVLVSGVI NSGNTVSGLF NMSLVAITTP ALISGFFNTG SNMSGFFGGP PVFNLGLANR 540 

GWNILGNAN IGNYNILGSG NVGDFNILGS GNLGSQNILG SGNVGSFNIG SGNIGVFNVG 600 

SGSLGNYNIG SGNLGIYNIG FGNVGDYNVG FGNAGDFNQG FANTGNNNIG FANTGNNNIG 660 

IGLSGDNQQG FNIASGWNSG TGNSGLFNSG TNNVGIFNAG TGNVGIANSG TGNWGIGNPG 720 

40 TDNTGI L WAG SYNTGILNAG DFNTGFYNTG SYNTGGFNVG NTNTGNFNVG DTNTGSYNPG 78 0 

DTNTGFFNPG NVNTGAFDTG DFNNGFLVAG DNQGQIAIDL SVTTPFIPIN EQMVIDVHNV 84 0 

MTFGGNMITV TEASTVFPQT FYLSGLFFFG PVNLSASTLT VPTITLTIGG PTVTVPISIV 90 0 

GALESRTITF LKIDPAPGIG NSTTNPSSGF FNSGTGGTSG FQNVGGGSSG VWNSGLSSAI 960 

GNSGFQNLGS LQSGWANLGN SVSGFFNTST VNLSTPANVS GLNNIGTNLS GVFRGPTGTI 102 0 

45 FNAGLANLGQ LNIGSANLGD FNLGS GNVGS FNVFSGNQGS YNIGPANLGN YNIGFANLGN 1080 

YNIGFGNAGD FNQGFANTGN NNIGFANTGN NNIGIGLSGD NQQGFNFAGG WNSGTANIGL 1140 

FNSGTNNVGI GNSGTGNWGI GNSGSGNTGI GNTGSTNTGF FNTGIVNTGV ANAGSYNTGW 12 0 0 

YNTGDTNTGI ANLGDFNTGF YNTGNFSTGF ANQGDIATGA FITGDMGNGA FWRGDQQGLF 12 60 

SAGYRVHVPE IPAHVTVEVP VNIPITASFT NTVYSGITLE QINFGFTIDI AGIPLLAGAI 1320 

50 SKAVLPPITG TGPAITVNIG DPGGSTAIRI PATASVGPFD VTFVNIAATT GFFNATTDPS 1380 

SGFFNGGPGT VSGIANIGAN ISGFQNVANS ATS GFNNYGS LQSGLANLGD TVS GVFNTGI 1440 

GAPANVSGMF NIGSNLAGFF HDQATGMSMF NLGLGNIGQF NVGFSNVGDS NAGLANIGSF 15 0 0 

NLGSGNLGSF NVFGGNQGSY NIGPANLGNY NIGLGNLGSY NFGFGNAGDF NLGFANTGNN 1560 

NI GFANTGNN NIGIGLSGDN QQGFNFAGGW NSGSGNSGLF NSGTNNIGLF NSGTGNIGIG ' 1620 

55 NS GTGNWGI A NTGDTNTGIF NTGDVNTGLL NAGNVNTGIF NTGHYNTGSF NAGS FNTAGF 1680 

NPGS YNTGYL NTGS YNTGL A NSGDVNTGGF ITGNYSNGFW WRGDYQGLAG ISQTITVPDT 1740 

AVPVKLHVPI FLDIPVTGTL GTFTVHGFRF PEITGDIFLI GIPFNAATLD AFSFPNISIV 180 0 

LPNIGINLGS GPDPLIDIAG TGGLLPIKIP LIDIPAAPGF GNSTTTPSSG FFNAGTGTVS 18 60 

GVGNVGSNSS GFFNLTSGSS GISGVQNFGE LISGGFNFGN TVS GL VNA S T LGLSMPANLS 1920 

60 GGGNVGATVA GFVNNTQILN LGFGNVGSGN VGHGNIGDSN VGLGNLGNAN VGHGNIGSFN 1980 

VFSGNRGSYN IGPANLGNYN IGLGNLGSYN FGFGNAGDFN LGFANSGSNN IGFANTGNNN 204 0 

IGIGLSGHNQ QGFGSWNSGT ANTGLFNSGT NNIGLFNSGT GNIGIGNSGI GNTGIGNPGV 2100 

GNTGLGNSGT GNWGLWNPGT GNMGVANVGT YNTGGYNVGS TNTGIANVGI ANTGS YNTGS 2160 

TNTGSFNDGD FNTGFYNTGD YNTGFYNTGD VNTGAFIGGN FSNGAFWQSD HQ GQWGAHYA 222 0 

65 ITVPQIPLLN FSLNIPVNIP IHLDFGTLAV NGFQIPAITL RALGVTHF S V GPIIVPRIAG 22 80 

TLPVIDINIG DPGGSSSIPI TITSGAGPW IPLLDIPPAP GFGNSTTGPS SGFFNSGTGS 2340 

SSGFGNVGAN NSGFWNTAFA GIGNSGLQNF GSLQSGWANL GNTVSGFYNT SAADFATPAN 24 00 
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LSGLSNVGAD LTGVLRGPNG S TFNAGLANL 
GFGNIGNANI GGANIGDFNV GIANTGPGLT 
TGNNNIGIGL SGDNQIGFGP LNAGIANMGL 
FNTGNNNVGI WLTGDGLSGF SSLNSGAGNT 
5 GIGNMGTGGF GVGLSGDSQV GIGGTNSGSF 
NTGIGNSGNY NTGLLNAGLV NTGIANPGNH 
NTGMANAGDY GTGAFITGSM NNGLLWRADR 
GDITNVSIPA ITFPRIDASG SVDIGILSGT 
PAINLNIGKP DGSTVINIVG GAGAGPISIP 

10 GLLNFGNNSG LYNFATSSMG NSGFQNYGSL 
IGTNLAGWLQ NGPTETTFSV GLANLGFWNL 
GSANIGDFNL GSANIGSSNI GFGNVGPGLT 
TGNGNIGIGL TGDTMTGFGG WNSGTGNIGL 
GNTGSTNSGF PNTGLVNTGX GNSGDYNTGL ' 

15 FNPGNSNTGI ANSGDVNTGA FNSGNYSNGF 



GQFNVGSANL GSANLGSANL GSANLGNSNV 2460 

AAVNNIGIGN TGNYNIGVGN TGNYNIGFGN 2520 

FNLGDNNFGM ANAGNFNQGI ANTGNNNIGL 2580 

GFFNSGTANT GLFNSGTGNT GLFNSGTGNV 2640 

NIGLFNSGTG NVGIGNSGTG NVGIGNTGTG 2700 

NTGLFNIGTF NTGIANPGHY NTGSYNTGSY 2760 

QGLLAANYTI TIERPAAFLM VDIPVNIPIT 282 0 

VLAPVGPITL HGGDASAPLD TPIEIDFGPS 28 80 

IIDLRPAPGF FNATTGPSSG FLNWGAGSAS 2940 

QSGWANLGNS ISGIYNTGLG APANVSGLLN 30 00 

GSANIGNYNL GSANIGVYNL GSANIGDFNL 3 060 

AAIGNIGFGN TGNGNIGIGN TGTGNIGFGN 3X20 

FNSGTGNIGF GNSGTGNWGI GNSGDYNTGI 318 0 

FNAGNTNTGS FNPGDYNTGG FNPGNYNTGY . ..324.0 

FWRGDYQGLG GFAYQSAVSE IPWSYDRFQH 33 00 



<212> Type : PRT 
<211> Length : 3300 

SequenceNarae : SEQ ID 133 
20 SequenceDe script ion : 

Sequence 



<213> OrganistnName : Mycobacterium tuberculosis H37Rv 

25 <40 0> PreSequenceString : 

MNLVSTTSGM SGFLNVGALG SGVANVGNTI SGIYNVGTSD LSTPAVNSGL ANIGTNIAGL 60 

LRD GAGTAA I NLGLANHGNL NVGFAS LGGF NFGGATIGHN NVGIGNTGIF DVGLANLGSY 120 

NIGFGNLGDD NLGFGNFGSY NIGFGNVGND NLGFANAGGG NIGFANTGSN NVGFGNTGSN 180 

NVGIGLTGNG QIGFGSFNSG SGNIGLFNSG SNNIGFFNSG SGNFGIANSG SFNTGIGNTG 240 

30 NTNTGLFNSG DVNTGAFNPG SFNTGSFNTG SFNTGGFNPG NTNTGYLNIG NYNTGIANTG 3 00 

DVDTGAF I TG NYSNGLFLSG DYQGLVGLNL VIDMPLPISL GVNIPIDIPI TASAGNITLM 3 60 

GVTIPPTGDI VLSSIAGQRA HFGPITIPNI TWGPTTTVA I GGPNTAI T I TGGGAIRIPL 42 0 

ISIPAAPGFG NSTTNPSSGF FNTGAGGASG FGNFGGANSG FWNLASATSG ASGLLNVGAL 48 0 

GSGLANVGTT VSGFYNTSTS DLATPAFNSG LANISTSIAG LLRDSTGTMV LNLGLANHGT 540 

35 LNVGIANLGD YNIGFANLGS ANFGSANIGG NNIGGANTGI FDIGLANLGS YNI GFGNFGD 60 0 

DNLGFGNLGS YNVGFGNLGN DNLGFANTGS NNI GFANTGS NNIGIGLTGD GQIGFGSLNS 660 

GSGNIGLFNS GSGNIGFFNS GNGNVGIGNT GTANFGLGNT GSTNTGFFNS GDWTGIGNT 72 0" 

GSFNTGSFNP GDSNTGDFNP GSYNTGLGNT GDVDTGAF I S GSYSNGFLWS GNYQGL I GLH 780 

AALAIPEIAL TFGVDIPIHI PINIDAGWT LQGFS IVAAE NNIDFTPIII PTINITLPTA 840 

40 AITVGGPTTS IGITASAGIG SITIPIIDIP ATSGFGNSTT SPSSGFFNSG AGSASGFLNV 900 

VAGASGISGY LNVGALGSGV TNVGHTVSGF YNASALDLVT PAFAS GLMRD GMGTMTLNLG 960 

LANL GSNNAG FGNTGIFDVG VANLGNYNIG FGNFGDDNLG FANLGSYNIG VANTGSNNIG 102 0 

FANTGSNNIG IGLTGTGQIG IGALNSGSGN I GLFNS GDGN IGFFNSGTGN FGI GNTGTGN 1080 

FGIGNSGSTS TGLFNSGDGN TGGFNPGNFN TGNFNTGS FN TGGFNAGNTN TGHFNTGNYN 1140 

45 TG I ANTGDVS TGAF I SGNYS NGILWRGDYQ GLIGYSYALT IPEIPAHLDV NIPIDIPITG 120 0 

SFTDLWDNF TIPIIGFESF AFSFHIHTEP DIGPIIVPSF VLSVPTFAIA VGGPTTAINI 1260 

SATAGLGPIT IPIIDIPAAP GIGNSTTSPS SGFFNTGAGT ASGFGNVGGN TSGLWNLASA 1320 

ASGVSGLLNV GALGSGVANV GNTISGIYNT S PLDLGTPAF GSGLANIAGL LQGGAGTTIL 13 80 

DLAGLGNLNV GLANLGGSNF GIGNTGIFNV GFANVGNHN I GLANLGNYSV GFANS GNYHI 1440 

50 GIANTGSANI GFANTGS GN I GIGLTGTGQI GFGSFNSGSH NIGLFNSGDG NVGFFNSGTG 15 00 

NVGI GNTGTA NFGIANSGSF NTGLGNTGST NTGLFNPGNV NTGVGNTGSI NTGSINTGSF 1560 

NTGSTNTGSF NLGDHNTGSF NSGDYNTGYF NAGDYNTGVA NTGNVNTGAF ISGNYSNGFF 1620 

WRGDYQGLIG LSTTITIPEI PYRYDLSVPI DIPITGTWA TTPNSFTIPG FQIRVLLGPA 16 8 0 

AVLVNEMIGP ITIDVNQVIA IDSPIQQTIS MVGTGGFGPI PIGISIGGTP GFGNSTTGPS 1740 

55 SGFFHTGAGH VSGFGNFGAG NMSGSGNFGA GNS GFFNAGG LGNSGLLNFG ALQSGLANLG 18 00 

NTISGVYNTS TLDLATPAFG SGIANIGANL AGLFLDNTGN LTLNFGVANQ GGLNAGIGNL 1860 

GSVNIGFVNT GDSNLGIGNL GDLNFGGVNI GGNNI GIANT GIFDIGLANL GSYNIGLANL 192 0 

GDDNLGFGNA GSYNIGFANF GSDNLGFANT GSYNIGFANT GNNNI GVGLT GNGQIGIGSL 1980 

NSGSNNIGLF NSGSGNIGFF NSGTGNVGIF NTGTGNFGLA NSGGFNTGIG NAGS TNTGVF 2040 

60 NPGDLNTGSF NPGSFNTGGF NPGSGNTGYL NTGDYNTGVA NTGDVDTGAF ITGSYSNGFL 2100 

VSGDYQGLIG LPLLGIPVTP GYFNLTGGPS SGFFNSGAGS VS GFVNS GAG LSGYLNTGAL 2160 

GS GVANVGNT ISGWLNASAL DLATPGFLSG I GNFGTNLAG FFRG 2204 
<212> Type : PRT 
<211> Length : 2204 

65 SequenceNarae : SEQ ID 134 

SequenceDescription : 
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Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

5 MSFVLIAPEF VTAAAGDLTN LGSSISAANA SAASATTQVL AAGADEVSAR IAALFGGFGL 60 

EYQAISAQVA AYHQRFVQAL STGAGAYASA EAAAAEQIVL GVINAP TQAL LGRPLIGDGA 12 0 

NATTPGGAGG AGGLLFGNGG AGAAGAP GQA GGPGGPAGLW GNGGP GGAGG SGGGTGGAGG 180 

AGGWL FGVGG AGGVGGAGGG TGGAGGPGGL IWGGGGAGGV GGAGGGTGGA GGRAELLFGA 240 

GGAGGAGTDG GPGATGGTGG HGGVGGDGGW LAP GGAGGAG GQGGAGGAGS DGGALGGTGG 3 00 

10 TGGTGGAGGA GGRGALLLGA GGQGGLGGAG GQGGTGGAGG DGVLGGVGGT GGKGGVGGVA 360 

GLGGAGGAAG QLFSAGGAAG AVGVGGTGGQ GGAGGAGAAG ADAPASTGLT GGTGFAGGAG 420 

GVGGQGGNAI AGGINGSGGA GGTGGQGGAG GMGGS GADNA SGXGADGGAG GTGGNAGAGG 480 

AGGAAGTGGT GGWGAAGKA GIGGTGGQGG AGGAGSAGTD ATATGATGGT GFS GGAGGAG 540 

* GAGGNTGVGG TNGSGGQGGT- GGAGGAGGAG GVGADNPTG I GGTGGTGGKG GAGGAGGQGG .6.00, 

15 SSGAGGTNGS GGAGGTGGQG GAGGAGGAGA DNPTGI GGAG GTGGTGGAAG AGGAGGA1GT 660 

GGTGGAVGSV GNAGIGGTGG TGGVGGAGGA GAAAAAGS S A TGGAGFAGGA GGEGGAGGNS 72 0 

GVGGTNGS GG AGGAGGKGGT GGAGGSGADN PTGAGFAGGA GGTGGAAGAG GAGGATGTGG 7 80 

TGGWGATGS AGIGGAGGRG GDGGDGASGL GLGLSGFDGG QGGQGGAGGS AGAGGINGAG* 840 

GAGGNGGDGG DGATGAAGLG DNGGVGGD GG AGGAAGNGGN AGVGLTAKAG DGGAAGNGGN 900 

20 GGAGGAGGAG DNNFNGGQGG AGGQGGQGGL GGASTTSINA NGGAGGNGGT GGKGGAGGAG 960 

TLGVGGS GGT GGDGGDAGSG GGGGFGGAAG KAGGGGNGGR GGDGGDGASG LGLGLSGFDG 1020 

GQGGQGGAGG SAGAGGINGA GGAGGNGGDG GDGATGAAGL GDNGGVGGDG GAGGAAGNGG 10 80 

NAGVGLTAKA GDGGAAGNGG NGGAGGAGGA GDNNFNGGQG GAGGQGGQGG LGGASTTSIN 1140 

ANGGAGGNGG TGGKGGAGGA GTLGVGGSGG TGGDGGDAGS GGGGGFGGAA GKAGGGGNGG 1200 

25 VGGDGGEGAS GLGLGLSGFD GGQGGQGGAG GSAGAGGING AGGAGGTGGA GGDGAPATLI 12 60 

GGPDGGDGGQ GGIGGDGGNA GFGAGVPGDG GDGGNAGFGA GVPGDGGIGG TGGAGGAGGA 1320 

GADGDPS IDG GQGGAGGHGG QGGKGGLNST GLASAASGDG GNGGAGGAGG NGGDGDGF I G 13 80 

GSGGTGGTGG DAGVGGLANT GGTAGNAGIG GAGGRGGDGG AGDSGALSQD GNGFAGGQGG 1440 

QGGVGGNAGA GGINGAGGTG GTGGAGGDGQ NGTTGVAS EG GAGGQGGDGG QGGIGGAGGN 15 0 0 

30 AGFGAGVP GD GGI GGTGGAG GAGGAGADGD PSIDGGQGGA GGHGGQGGKG GLNSTGLASA 1560 

ASGDGGNGGA GGAGGNGGDG DGFIGGSGGT GGTGGDAGVG GLANTGGTAG NAGI GGAGGR 1620 

GGDGGAGDSG AIiS QDGNGFA GGQGGQ GGVG GNAGAGGING AGGTGGTGGA GGDGQNGTTG 1680 

VAS EGGAGGQ GGDGGQGGIG GAGGNAGFGA GVPGDGGIGG TGGAGGAGGA GADGDPS IDG 1740 

GQGGAGGHGG QGGKGGLNST GLASAASGDG GNGGAGGAGG NGGAGGLGGG GGTGGTNGNG 1800 

35 GLGGGGGNGG AGGAGGTPTG SGTEGTGGDG GDAGAGGNGG SATGVGNGGN GGDGGNGGDG 1860 

GNGAPGGFGG GAGAGGLGGS GAGGGTD GDD GNGGSPGTDG S 1901 
<212> Type : PRT 
<211> Lengtb r 1901 

SequenceName : SEQ ID 135 . 

40 SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 

45 <40 0> PreSequenceString : 

MSLVIVAPET VAAAALDVAR IGSSIGAANA AAAGSTTSVL AAGADEVSAA IATLFGSHAR 60 

EYQAISTQVA AFHDRFAQTL SAAVGSYVSA EATNAAPLAT LEHNVLNALN APTQALLGRP 120 

L I GDGAAGAP GTGQAGGAGG I LWGNGGAGG SGAPGQVGGA GGAAGL FGTG GAGGAGGAGA 180 

AGGAGGSGGW LLGNGGVGGA GGQSLLGGAT GGAGGNAGLF GVGGTGGPGG PGGPGGVGGT 240 

50 GGAGGLGGTL YGAGGHGGAG GPGPIGGVGG HGGVGGAAGL LGVGGHGGAG GHGAE GVAGA 3 00 

AGEDLSPHGT SGGVGGDAGD GGTGGRGGWL AGAGGAGGAG GVGGTGGAGG AGFSRALIVA 3 60 

GDNGGDP GAG GAGGTGGAGS T I GAHGAAG A SPTSGGNGGA GGNGAHFSSG GKAGGNGGAG 420 

GAGGLVGNGG AGGAGGNGAP GAPPSGGDPN GGGGGAGGAG GKGGDGGAQA GDGGAGGAGG 480 

KGGNGGNGAT GATGLNGLGA GADGTDGGKG GNGGAGGGGG AGGQGGKALA ATHQDGSMGA 540 

55 GGAGGNGGAG GMGGD GGNGA KGTFDNGGDG VGGNGGNGGS RGIGGAGGIG GAGS TAGADG 600 

ARGATPTSGG NGGTGGNGAN ATVAGGAGGA GGKGGNGGLV GNGGAGGKGG DGMAGVAGSS 660 

PTTAGESGTS GQNGGAGGAG GAGGRGGDFG GDGGTGGAGG NGANGANATT PGAKGGDGGH 72 0 

GGPGAQGGNG GQGGPGGLAG NLFGQNGIQG VGGS GGKGGA GGLAGDGGNG ANGNFAFGDG 78 0 

NGGHGGNGGN PGAGGQGGSG GAGS TP GAKG AHGFTPTSGG DGGDGGNGGN SQWGGNGGD 840 

60 GGNGGNGGSA GTGGNGGRGG DGAFGGMSAN ATNPGENGPN GNP GGNGGAG GAGGAGLNGG 900 

NGGAGGNGGL GGFGGNGAAG ANGVAVGAPG QPGGAGGHGG AGGNGGAGGN GGQGWSDGA 960 

GGAGGAGGDG GAP GDGANGG NGQGAGAFAG GGGGRGGDGG NAGNAGAGGP GGTGS TAGKA 102 0 

GPAGSILHDG GNGGHGGHGA AS GGNGGP GG HGGNGGNGGT GANGGNGGIG GTGGAGSTGA 1080 

KGVLGTNEGD GGD GGRGGNG GRGGNGGQGL TGAGGNGGTG GTP GNGGNGG NGASGDLVTS 1140 

65 PGDGGGGGRG GDAGRGGDAG LGGSSGPGGT PGDWGTGGTG GTGGTGGQGA NGGLTGGRGG 1200 

TGGNGGNGNT GGTGGAGGTG GTGHNGSQPG MGGNGGAGGF GGNGFAGVGG RGGMGGSGGT 12 60 

GGTGDAGPFG TGTGGTGGHG GQGGGGGFSI LLGLGGLGGL GSPGSIATGT AGGAGGGGGF 132 0 
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GGLGGGEFV 132 9 

<212> Type : PRT 
<211> Length : 1329 

SequenceNarae : SEQ ID 13 6 
5 SequenceDescription : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 

10 <40 0> PreSequenceString : 

MSYVIATPEM MATAAFDLAR IGSQVSAASA VAAMPTTEW AAGADEVSAG IAALFSAHAQ 60 

EYQALSAQAA AFHDQFVHTL TAAARWYTAT E I ANAAAMRV VLGAVNAPTQ TLLGRPLIGD 12 0 

GAHGTAPGQP GGAGGLLFGN" GGNGAAGAVG QVGGAGGAAG LFGIGGAGGA GGAGAPGGTG 18 0 

GTGGWLAGGG GVGGMGGAGG .GAGGAGGNAG LF GNGGAGGA GGAGGGAGGA GGNAGWFGHG 240, 

15 GAGGVGGVGA AGANGATPGQ DGAAGVAGSD DGAGGDGLAG SDGGDGGAGG VGGNGGRGGW .3 00 

LLGNGGAGGV GGVGGAGGAG AAGGAGGAGA TGINGPAGIS AAGGDGGAGG NGGAGGNGGV 3 60 

GGAGGAGGSA GLLGYVGRAG DGGAGGGGGL GGAP GDGGAG GNGGSWLAAG DGGAGGHGGD 42 0 

PGLGGAGGAG GASGGAGARA GANGLAAGND GPVSGGMGGK GGNGAHAPVA GGHGGNGGAG 48 0 

GNGGLVGDGG AGGHGGDGAA GAGYADMTAI FLGSSGTPGE DGGNGGAGGA GGAGGAHAGD 540 

20 GGAGGAGGNG GAGGAGGNGA HGFNAVLVSD GGNGGDGGAG GRGGDGGAGG AGGDAPAGRA 600 

GSQGVGGDGG AGGAGGAPGN" GGSGGRGDMA FKDGDGGAGG DGGDPGAGGK GGAGGAGATE 660 

GVTGATGATV HSGGNGGKGG NGADATVAGA NGGKGGAGGN GGLVGDGGAG GDGGS GAAGA 72 0 

NGANVGEDGA DGTLSGQPGE GSEANGGQGG VGGGGAGGAG GDGGAGS SAL GSGGNGGRGD 78 0 

AGQAGGAGGA GGAGGAGGSV SGDGGPGGKG GAGGAGGAGA SGGGGGKGAS GADSAEAVGG 840 

25 AGGKGGD GGV GGVGGD GGPG GDGGAGGAAP AGQVGSHGVG GVGGDGGLGG AGGNGGD GGH 90 0 

GSDGGDGGDG GDPGAGGLGG LGGDSGNGTR AASGVDASDH GPGSGGNGGN GGNGAQASVA 960 

GGAGGNGGDG GNAGRVGDGG AGGNGGDGAA GANGANS GAP GSDALALGQP GGMGGQGDAG 1020 

QAGGAGGAGG AGGAGGSVSG DGGAGGNGGA GGNGGVGASG GAGARGANGI DS I GGTGGAG 10 8 0 

GGGGDGGAGG VGGHGGDGGV GGAAPSGTVG SHGTGGVGGD GGLGGAGGVG GAGGNGGIGI 1140 

30 TVGGAGGAGG NGGDP GAGGR GGLGGDSGNG TSAANGVDAS KHGPLTGGDG GVGGNGAKAA 12 0 0 

AAGGDGGQGG DGGNAGLFGD GGAGGDGADG TAAEALGGDG GAGGAGGKGG DAGDIGDGGD 1260 

GGKGGDGAHG ALGGLTVAGG NGGAGGAGGA GGAGGAFLGD GGNGGAGGQG GAGRGGSPGG 13 20 

GGGVGGHGGA GGDAGMNGGG GTGGQGGNGA AGGAGWSPDS DLKGFDGFDG GSGGAGGDGG 13 80 

AGGAGGTQTG DGGDGGAGGL GGAGGVGGNG VDGFDINETT GRDGGD GGDG GYGGWGGAGG 1440 

35 NGGAGGSAPA GEVGNRGVGG DGGDGGSGGD AGNGGLGGDG FTYLADFDGE PGGDGGDGGD 1500 

GGWGRP GGQG GFGSTSGAHG KAGFGAPGGD GGD GGNGGHG GDGNGSFADA GDGGPGGNGG 1560 

NGGLGGAGRD GGAPGGD GGD GGTGGS GGFG APPPRSIGGG DGGDGGRGGD GGRGAGGL T S 1620 

GGVGSSGESG GSGNGRGDPG SGGSGGEGGE GGPSISVNVT 1660 
<212> Type : PRT 

40 <211> Length : 1660 

SequenceName : SEQ ID 137 
SequenceDescription : 

Sequence 
45 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MSFVLVSPET VAAVATDLKR I GAS LAHENA SAAASTTAW SAAADEVSTA VAALFS QHAQ 60 

GYQAAAAQVA AFHS RFVQAL TAGAGAYAFA EAANASPLQS AMGAVSASAQ TLLSRPLIGN 120 

50 GANATTP GGN GGDGGWLFGS GGNGAP GAAG QSGGNGGSAG LWGNGGAGGA GGS GGAAGGN 180 

GGNGGWLFGA GGTGGI GGTG APGAMGGTGG NGGNGALLIG GGGLGGAGGM GGTGGGTGGT 240 

GGNGGNGALL IGAGGVGGAG GIGGQGTGAG GAAGAGGTGG NGGAGGLFMN GGD GGAGGQ G 3 00 

GDGAAGDAAA SAGGTGGKGG QGGDGGTGGA GGAGPVLFGH GGAGGMGGQG GTGGMGGAGG 3 60 

DGTTVIAAGT GGEGGTGGAA GAGGAAGARG ALTSGGLAGG VGAGGTGGTG GTGGNGADAA 420 

55 AWGFGANGD PGFAGGKGGN GGI GGAAVTG GVAGD GGTGG KGGTGGAGGA GNDAGSTGNP 480 

GGKGGDGGIG GAGGAGGAAG TGNGGHAGNT GDGGDGGTGG NGGNGTGGVN GADNTLNPDT 540 

PGGAGE P GGA GGAGGAGGAA GGPGGTGGTG GNGGNGGNGG NGGNGGNGGN GGNAGNNSTN 600 

APVGGEGGAG GDGGAGGAGG AANGGTAGSQ GTGGVGGDGG AGGNGGGGKA GTGNSGNFGV 660 

DGEAGFSGGA GGNGGVGGAA GANGGTGGSG GNGGDGGAGG I GG AGGNG IP GTGTE PAGGT 720 

60 GAKGGDGGDG GAGGAGGNAG GAGGQGGNAG QGGAGGAGGN AVI PGDGVGK APHGDAGGSG 780 

GDGGKGGQGG S GGTGGS GAP I GGGAGGTGG SGGHAGKGGA GGIGAQGTTI TVP GMGGNAG 840 

DGGNGGNAGA GGNGGSGDFG GNTTSGASGS GGNGGNAGTA GSGGAGGTGG TGLSGGNGGN 900 

GGNGGNGGDG GNGAHGTVGA QF VP ATS LPT PNGGAGGNGG TGSNGGAPGP AGAPGPTTGG 960 

NAGS QGI GGD GGNGGDGGKG GDGADAVNW FMPTEPQAAT GTAGSAGDPT GGNGGPGTPG 1020 

65 SPMVAPPPPT PITQVQQGGD GGAGGTGSTN ANDGTATGGK GGEGGVGSIL GGPGGNGGTG 1080 

GNASATGTNG VANAGNGGKG GDGGQFGAGG NGGAGGSVTD GS AGS T AGNG GNGGNATNGT 1140 

I AGQ PAGGNG SAGGKGGDGG N I AAGATGTA GNGGNGGNGN DGAVNAGTGG S GGNGGNAGG 12 0 0 
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GGANGGDGGA GGAGGAGGRG GKGIDGGFGG DGGNGGSNNG TGAGGNGGNG GTGGVGSVGA 1260 
AGGDGGNGGT GGPAGFGGTA GNGGS GGTGG AGGDGGTGGD GGNGVIAGGG GTGGNGGASG 13 2 0 
AGGAGGTGGF AGNGNAGGNG GTGGAS EDGD NGNAGSGATG GTGGNGGTGG DGGAAGLGGV 13 8 0 
A 1381 
5 <212> Type : PRT 

<211> Length : 1381 

SequenceName : SEQ ID 138 

SequenceDe script ion : 

10 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MSFVIATPEM LTTAATDLAK IGS.TXTAANT AAAAVAKVLP ASADEVSVAV- AALFGTHAQE CQ 

15 YQ TVS AQ VAT FHDRFVQTLS AAASSYVAAE AVNVEQSLLA AVNAPTQALF GRPLIGNGAD 12 0 

GSPGTGQAGG PGGILYGNGG NGGSGAPGQR GGAGGAAGL I GNGGNGGAGG VGTTGGAGGH 18 0 

GGAGGWLYGN GGAGGFGGAG AVGGNGGAGG TAGL FGVGGA GGAGGNGIAG VTGTSASTPG 240 

GSGTAGGAGG I GGNGGAGGA GGVLMGNGGN GGAGGEGGPG GAGGAGASGA HATMLGADGQ 3 00 

AGGNGGNGGA GGTGGVGGPG GGHGLLGLGG SHGAGGAGGS GGDGGAPGDG GNGATGTWGH 3 60 

20 NLGAGGTGGN GGNPGAGGAG GAGGASVGGS AHGANGAPGT TSTSGGNGGD GGKGADAISS 42 0 

GQTGANGGRG GDGGQVGNGG AGGAGGRGGA GGLGFGSEAP GRPGGAGGTG GAGGNGGTQA 48 0 

GD GGTGGAGG AGGDGGS GGA GSIGFNASAP GAAGSPGGNG GNGGPGGAGG EGGAGGLALA 540 

ASGQNGSQGA GGDGGAGGNG GTPGNGGHGA AGALGVNGGV GGAGGHGGDP GVGGAGGQGG 60 0 

SGSTPGANGA PGNTPTSGGN GGNGGRGADA TGFGQTGASG GRGGDGGLVG NGGAGGAGGN 660 

25 GSKGLPGLGR LGNPGLDGGT GGNGGAGGSG GAWAGNGGTG GAGGTGGVGG TGGSGSDGVN 72 0 

GSSAGADGHP GGTGGVGGTG GKGGDGGDGG AAPNGVAGSQ GPGGAGGDGG TGGVGGNGGR 780 

G I DGADGATA GARGQDGGAG GAGGKGGRGG TGGPGGAGPA GTTGSQGAGG NGGSGGTGGD 84 0 

PGDGGNGANG SVFTNNGIGG NGGNGGNAGP SGAGGSGGAG STFGATGSSS SIHVNGGNGG 90 0 

NGGNGDHALS GNGAAGGNGG NGGNGSLRGS GGAGGHGGNG GNASRGMGGD GGTGGAGGNA 960 

30 GQXGNGGAGG NGGDGGTGSD GNPGAITGSG GRGGDGGVGG QGGSVAGDGA DGGRGGAGGT 1020 

GGTGLRGTTG ATGATGTFDA GADGHGGNGG TGGVGGTGGA GGGGGNGGAG GKALSPTGNXT 1080 

GSQGAGGDGG AGGAGGTGGT GGDGGRGAHG TLFSSLAGTG GTGGNGGTGG TGGTGGAGGA 1140 

GGTGSTLGAT GATGAAGRAG NGGVGGS GGL GSAFGP GGTG GMGGAGGTST VSAGGDGGRG 1200 

GFGGDGLDAS SGGNGGDGGH GGDGFRTAGA GGRGGDGGKG ADPGGLFPIP GAGGKGGTGG 1260 

35 TGGTAHLGPL AIXGQSGQPG QFGSPGADGR GGAGGAGGGG GAGGSF" 13 06 
<212> Type t PRT 
<211> Length : 13 0 6 

SequenceName : SEQ XD 139 
SequenceDescription : 

40 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 

45 MS AAAVAWD Q LAMELASAAA SFNSVTSGLV GESWLGPSSA AMAAAVAP YL GWLAAAAAQA 60 

QRSATQAAAL VAE F EAVRAA MVQPALVAAN RSDLVSLVFS NFFGQNAPAI AAXEAAYEQM 120 

WAIDVSVMSA YHAGASAVAS ALTPFTAPPQ NLTDLPAQLA AAPAAWTAA ITSSKGVLAN 180 

LSLGLANSGF GQMGAANLGI LNLGSLNPGG NNFGLGNVGS NNVGLGNTGN GNIGFGNTGN 240 

GNI GFGLTGD NQQGFGGWNS GTGNIGLFNS GTGNIGIGNT GTGNFGI GNS GTSYNTGIGN 300 

50 TGQANTGFFN AGIANTGIGN TGNYNTGS FN LGS FNTGDFN TGSSNTGFFN PGNLNTGVGN 360 

TGNVNTGGFN SGNYSNGFFW RGDYQGLIGF SGTLTIPAAG LDLNGLGSVG PI TIPS I TIP 420 

EIGLGINSSG ALVGPINVPP ITVPAIGLGI NSTGALVGPI NIPPITLNSI GLELSAFQVI 480 

NVGSISIPAS PLAIGLFGVN PTVGSIGPGS ISIQLGTPEI PAIPPFFPGF PPDYVTVSGQ 540 

IGPITFLSGG YSLPAIPLGI DVGGGLGPFT VFPDGYSLPA IPLGIDVGGG LGPFTVFPDG 600 

55 YSLPAIPLGI DVGGGLGPFT VFPDGYSLPA IPLGIDVGGA IGPLTTPPIT IPSIPLGIDV 660 

SGSLGPINIP IEIAGTPGFG NSTTTPSSGF FNSGTGGTSG FGNVGSGGSG FWNIAGNLGN 72 0 

SGFLNVGPLT SGILNFGNTV SGLYNTSTLG L AT S AFHSGV GNTDS QLAGF MRNAAGGTLF 780 

NFGFANDGTL NLGNANLGDY NVGSGNVGSY NFGSGNIGNG SFGFGNIGSN NFGFGNVGSN 840 

NLGFANTGPG LTEALHNIGF GNI GGNNYGF ANIGNGNIGF GNTGTGNIGI GLTGDNQVGF 900 

60 GALNSGSGNI GFFNS GNGN I GFFNSGNGNV G I GNS GNYNT GLGNVGNANT GLFNTGNVNT 960 

GIGNAGSYNT GSYNAGDTNT GDLNPGNANT GYLNLGDLNT GWGNIGDLNT GALISGSYSN 1020 

GILWRGDYQG LIGYSDTLSI PAIPLSVEVN GGIGPIWPD ITIPGIPLSL NALGGVGP IV 1080 

VPDITIPGIP LSLNALGGVG PIWPDITIP GIPLSLNALG GVGPIWPDI TIPGIPLSLN 1140 

ALGGVGP I W PDITIPGIPL SLNALGGVGP ITVPGVPISR IPLTINIRIP VNITLNELPF 1200 

65 NVAGIFTGYI GPIPLSTFVL GVTLAGGTLE SGIQGFSVNP FGLNIPLSGA TNAVTI PGFA 1260 

INPFGLNVPL SGGTSPVTIP GFAINPFGLN VPLSGGTSPV TIPGFTIPGS PLNLTANGGL 1320 

GPINIPINIT SAPGFGNSTT TPSSGFFNSG DGSASGFGNV GPGISGLWNQ VPNALQGGVS 13 80 
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GIYNVGQLAS GVANLGNTVS GFNNTSTVGH LTAAFNSGVN NIGQMLLGFF SPGAGP 1436 

<212> Type : PRT 
<211> Length : 143 6 
5 SequenceName : SEQ ID 140 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MEFPVLPPEI NSVLMYS GAG SSPLLAAAAA WDGLAEELGS AAVS FGQVTS GLTAGVWQGA 60 

AAAAMAAAAA PYAGWLGSVA AAAEAVAGQA RWVGVFEAA LAATVDPALV AANRARLVAL 120 

AVSNLLGQNT PAIAAAEAEY ELMWAADVAA MAGYHSGASA AAAALPAFSP PAQALGGGVG 3.80 

15 AFLTALFAS P AKALSLNAGL GNVGNYNVGL GNVGVFNLGA GNVGGQNLGF GMAGGTNVGF 240, 

GNLGNGNVGF GNSGLGAGLA GLGNIGLGNA GSSNYGFANL GVGNIGFGNT GTNNVGVGLT 3 00 

GNHLTGIGGL NSGTGNIGLF NSGTGNVGFF NSGTGNFGVF NSGNYNTGVG NAGTASTGLF 360 

NAGNFNTGW NVGSYNTGSF NAGDTNTGGF NPGGVNTGWL NTGNTNTGIA NSGNVNTGAF 420 

ISGNFNNGVL WVGDYQGLFG VSAGSSIPAI PIGLVLNGDI GPITIQPIPI LPTIPLSIHQ 480 

20 TVNLGPLWP DIVIPAFGGG IGIPINIGPL TITPITLFAQ QTFVNQLPFP TFSLGKITIP 540 

QIQTFDSNGQ LVSFIGPIVI DTTIPGPTNP QIDLTIRWDT PPITLFPNGI SAPDNPLGLL 60 0 

VSVSISNPGF TIPGFSVPAQ PLPLSIDIEG QIDGFSTPPI TIDRIPLTVG GGVTIGPITI 660 

QGLHIPAAPG VGNTTTAPSS GFFNSGAGGV SGFGNVGAGS SGWWNQAPSA LLGAGSGVGN 720 

VGTLGSGVLN LGSGISGFYN TSVLPFGTPA AVSGIGNLGQ QLSGVSAAGT TLRSMLAGNL - 780 

25 GLANVGNFNT GFGNVGDVNL GAANI GGHNL GLGNVGDGNL GLGNIGHGNL GFANLGLTAG 840 

AAGVGNVGFG NAGINNYGLA NMGVGNIGFA NTGTGNIGIG LVGDHRTGIG GLNSGIGNIG 90 0 

LFNSGTGNVG FFNSGTGNFG IGNSGRFNTG I GNS GTASTG LFNAGSFSTG IANTGDYNTG 960 

SFNAGDTNTG GFNPGGINTG WFNTGHANTG LANAGTFGTG AFMTGDYSNG LLWRGGYEGL 102 0 

VGVRVGPTIS QFPVTVHAIG GVGPLHVAPV PVPAVHVEIT DATVGLGPFT VPPISIPSLP 1080 

30 IASITGSVDL AANTISPIRA LDPLAGS I GL FLEPFRLSDP FITIDAFQW AGVLFLENII 1140 

VPGLTVSGQI LVTPTPIPLT LNLDTTPWTL FPNGFTIPAQ TPVTVGMEVA NDGFTFFPGG 120 0 

LTF PRASAGV TGLSVGLDAF TLLPDGFTLD TVPATFDGTI LIGDIPIPII DVPAVP GFGN 1260 

TTTAPSSGFF NTGGGGGSGF ANVGAGTSGW WNQGHDVLAG AGS GVANAGT LSSGVLNVGS 1320 

GISGWYNTST LGAGTPAWS GIGNLGQQLS GFLANGTVLN RSPIVNIGWA DVGAFNTGLG 13 80 

35 NVGDLNWGAA NIGAQNLGLG NLGSGNVGFG NIGAGNVGFA NSGPAVGLAG LGNVGLSNAG 1440 

SNNWGLANLG VGNI GLANTG TGNIGIGLVG DYQTGIGGLN SGSGNIGLFN SGTGNVGFFN 1500 

TGTGNFGLFN SGSFNTGIGN SGTGSTGLFN AGNFNTGIAN PGSYNTGSFN VGDTNTGGFN 15 60 

PGDINTGWFN TGIMNTGTRN TGALMSGTDS NGMLWRGDHE GLFGLSYGIT IPQFPIRITT 1620 

TGGIGPIVIP DTTILPPLHL QITGDADYSF TVPDIPIPAI HIGINGWTV GFTAPEATLL 1680 

40 SAL KNNGS F I SFGPITLSNI DIPPMDFTLG LPVLGPITGQ LGPIHLEPIV VAGIGVPLEI 1740 

EPIPLDAISL SESIPIRIPV DIPASVIDGI SMSEWPIDA SVDIPAVTIT GTTISAIPLG 1800 

FDIRTSAGPL NIPIIDIPAA PGFGNSTQMP SSGFFNTGAG GGSGIGNLGA GVSGLLNQAG 1860 

AGSLVGTLSG LGNAGTLASG VLNSGTAISG LFNVSTLDAT TPAVISGFSN LGDHM SGVS I 1920 

DGLIAILTFP PAESVFDQII DAAIAELQHL D I GNALALGN VGGVNLGLAN VGE FNLGAGN 1980 

45 VGNINVGAGN LGGSNLGLGN VGTGNL GFGN IGAGNFGFGN AGLTAGAGGL GNVGLGNAGS 2040 

GSWGLANVGV GNI GLANTGT GNIGIGLTGD YRTGIGGLNS GTGNLGLFNS GTGNI GFFNT 2100 

GTGNFGLFMS GSYSTGVGNA GTAS TGLFNA GNFNTGLANA GSYNTGSLNV GSFNTGGVNP 2160 

GTVNTGWFNT GHTNTGLFNT GNVNTGAFNS GSFNNGALWT GDYHGLVGFS FSIDIAGSTL 2220 

LDLNETLNLG PIHIEQIDIP GMSLFDVHEI VEIGPFTIPQ VDVPAIPLEI HESIHMDPIV 22 80 

50 LVPATTIPAQ TRTIPLDIPA SPGSTMTLPL ISMRFEGEDW ILGSTAAIPN FGDPFPAPTQ 23 40 

GITIHTGPGP GTTGELKISI PGFEIPQIAT TRFLLDVNIS GGLPAFTLFA GGLTI PTNAI 24 0 0 

PL T I DAS GAL DPITIFPGGY TIDPLPLHLA LNLTVPDSSI PIIDVPPTPG FGNTTATPSS 2460 

GFFNSGAGGV SGFGNVGSNL S GWWNQAAS A LAGSGSGVLN VGTLGSGVLN VGSGVSGIYN 252 0 

TSVLPLGTPA VLSGLGNVGH QLSGVSAAGT ALNQIPILNI GLADVGNFNV GFGNVGDVNL 2580 

55 GAANLGAQNL GLGNVGTGNL GFANVGHGNI GFGNS GL TAG AAGLGNTGFG NAGSANYGFA 2 64 0 

NQGVRNIGLA NTGTGNIGIG LVGDNLTGIG GLNSGAGNIG LFNS GTGNI G FFNSGTGNFG 2700 

IGNSGSFNTG IGNSGTGSTG LFNAGSFNTG VANAGS YNTG SFNAGDTNTG GFNPGTINTG 27 60 

WFNTGHTNTG IANSGNVGTG AFMSGNFSNG LLWRGDHEGL FSLFYSLDVP RITIVDAHLD 2820 

GGFGPWLPP IPVPAVNAHL TGNVAMGAFT IPQIDIPALT PNITGSAAFR IWGSVRIPP 28 80 

60 VSVIVEQIIN AS VGAEMRI D PFEMWTQGTN GLGITFYSFG SADGSPYATG PLVFGAGTSD 2940 

GSHLTISASS GAFTTPQLET GPITLGFQVP GSVNAITLFP GGLTFPATSL LNLDVTAGAG 30 00 

GVDIPAITWP EIAASADGSV YVLASSIPLI NIPPTPGIGN STITPSSGFF NAGAGGGSGF 3 060 

GNFGAGTSGW WNQAHTALAG AGS GFANVGT LHSGVLNLGS GVSGIYNTST LGVGT PAL VS 3120 

GLGNVGHQLS GLLSGGSAVN PVTVLNIGLA NVGSHNAGFG NVGEVNLGAA NLGAHNLGFG 3180 

65 NIGAGNLGFG NIGHGNVGVG NSGLTAGVPG LGNVGLGNAG GNNWGLANVG VGNI GLANTG 3240 

TGNIGIGLTG DYQTGIGGLN SGAGNLGLFN SGAGNVGFFN TGTGNFGLFN SGS FNTGVGN 33 00 

SGTGSTGLFN AGS FNTGVAN AGSYNTGSFN VGDTNTGGFN PGSINTGWLN AGNANTGVAN 33 60 
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AGNVNTGAFV TGNFSNGILW RGDYQGLAGF AVGYTLPLFP AVGADVSGGI GPITVLPPIH 3420 

IPPIPVGFAA VGGIGPIAIP DISVPSIHLG LDPAVHVGSI TVNPITVRTP PVLVSYSQGA 3480 

VTSTSGPTSE IWVKPSFFPG XRIAPSSGGG ATS TQGAYF V GPISIPSGTV TFPGFTIPLD 3540 

PIDIGLPVSL TIPGFTIPGG TLIPTLPLGL ALSNGIPPVD IPAIVLDRIL LDLHADTTIG 3600 

5 PINVPIAGFG GAPGFGNSTT LPSSGFFNTG AGGGSGFSNT GAGMS GLLNA MSD PLLGS AS 3 660 

GFANFGTQLS GILNRGAGIS GVYNTGALGV VTAAWSGFG NVGQQLSGLL FTGVGP 3 716 

<212> Type : PRT 
<211> Length : 3716 
10 SequenceName : SEQ ID 141 

SequenceDescription : 

Sequence 

15 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MNFPVLPPEI NSVLMYSGAG SSPLLAAAAA WDGLAEELGS AAVS FGQVTS GLTAGVWQGA 60 

AAAAMAAAAA PYAGWLGSVA AQAVAVAGQA RAAVAAFEAA LAATVDPAAV AVNRMAMRAL 120 

AMSNLLGQNA AAIAAVEAEY ELMWAADVAA MAGYHS GAS A AAAALPAFSP PAQALGGGVG 180 

20 AFLNALFAGP AKMLRLNAGL GNVGNYNVGL GNVG I FNLGA ANVGAQNLGA ANAGSGNFGF 240 

GNIGNANFGF GNSGLGLPPG MGNIGLGNAG SSMYGLANLG VGNIGFANTG SNNIGIGLTG 3 00 

DNLTGIGGLN S GTGNLGL FN SGTGNIGFFN SGTGNFGVFN SGSYNTGVGN AGTASTGLFN 3 60 

VGGFNTGVAN VGS YNTGS FN AGNTNTGGFN PGNVNTGWLN TGNTNTGIAN SGNVNTGAFI 420 

SGNFSNGVLW RGDYEGLWGL SGGSTIPAIP IGLELNGGVG PITVLPIQIL PTIPLNIHQT 480 

25 FSLGPLWPD I VI PAFGGGT AIPISVGPIT ISPITLFPAQ NFNTTF PVGP FFGLGWNIS 540 

GIEIKDLAGN VTLQLGNLNI DTRINQSFPV TVNWSTPAVT IFPNGISIPN NPLALLASAS 600 

IGTLGFTIPG FTIPAAPLPL TIDIDGQIDG FSTPPITIDR IPLNLGASVT VGPILINGVN 660 

IPATPGFGNT TTAPSSGFFN SGDGGVSGFG NFGAGSSGWW NQAQTEVAGA GSGFANFGSL 720 

GSGVLNFGSG VSGLYNTGGL PPGTPAWSG IGNVGEQLSG LSSAGTALNQ SLIINLGLAD 78 0 

30 VGSVNVGFGN VGDFNLGAAN IGDLNVGLGN VGGGNVGFGN IGDANFGLGN AGLAAGLAGV 840 

GNIGLGNAGS GNVGFGNMGV GNIGFGNTGT NNLGIGLTGD NQTGIGGLNS GAGNI GLFNS 900 

GTGNVGLFNS GTGNFGLFNS GSFNTGIGNG GTGSTGLFNA GNFNTGVANP GSYNTGSFNV 960 

GDTNTGGFNP GS INTGWFNT GNANTGVANS GNVDTGALMS GNFSNGILWR GNFEGLFGLN 102 0 

VGITIPEFPI HWTSTGGIGP IIIPDTTILP PIHLGLTGQA NYGFAVPDIP IPAIHIDFDG 1080 

35 AADAGFTAPA TTLLSALGIT GQFRFGPITV SNVQLNPFNV NLKLQFLHDA FPNEFPDPTI 1140 

SVQIQVAIPL TSATLGGLAL PLQQTIDAIE LPAISFSQSI PIDIPPIDIP ASTINGISMS 12 0 0 

EWPIDVSVD IPAVTITGTR IDPIPLNFDV LSSAGPINIS IIDIPALPGF GNSTELPSSG 1260 

FFNTGGGGGS GIANFGAGVS GLLNQASSPM VGTLSGLGNA GSLASGVLNS GVDISGMFNV 1320 

STLGSAPAVI SGFGNLGNHV SGVS IDGLLA MLTSGGSGGS GQPSIIDAAI AELRHLNPLN 1380 

40 I VNLG3WGS Y NLGFANVGDV NLGAGNLGNL NLGGGNLGGQ NLGLGNLGDG NVGFGNLGHG 1440 

NVGFGNSGLG ALPGIGNIGL GNAGSNNVGF GNMGLGNIGF GNTGTNNLGI GLTGDNQTGF 1500 

GGLNSGAGNL GLFNSGTGNI GFFNTGTGNW GLFNSGSYNT GIGNSGTGST GLFNAGSFNT 1560 

GLANAGSYNT GSLNAGNTNT GGFNPGNVNT GWFNAGHTNT GGFNTGNVNT GAFNSGSFNN 1620 

GALWTGDHHG LVGFSYSIEI TGSTLVDINE TLNLGPVHID QIDIPGMSLF DIHELVNIGP 168 0 

45 FRIEPIDVPA WLD I HE TMV IPPIVFLPSM TIGGQTYTIP LDTPPAPAPP PFRLPLLFVN 1740 

ALGDNWIVGA SNSTGMSGGF VTAPTQGILI HTGPS SATTG S LALTL P TVT IPTITTSPIP 1800 

LKIDVSGGLP AFTLFPGGLN IPQNAIPLTI DASGVLDPIT IFPGGFTIDP LPLSLALNIS 1860 

VPDSSVPIII VPPTPGFGNA TATPSSGFFN SGAGGVSGFG NFGAGSSGWW NQAHAALAGA 192 0 

GSGVLNVGTL NSGVLNVGSG I SGLYNTAI V GLGTPALVSG AGNVGQQLSG VLAAGTALTQ 1980 

50 SPIINLGLAD VGNYNLGLGN VGDFNLGAAN LGDLNLGLGN I GNANVGF GN IGHGNVGFGN 2040 

SGLGAALGIG NIGLGNAGST NVGL ANMGV G N I GFANTGTN NLGIGLTGDN QTGIGGLNSG 2100 

AGNIGLFNSG TGNIGFFNSG TGNWGLFNSG SFNTGIGNSG TGSTGLFNAG GFTTGLANAG 2160 

SYNTGSFNVG DTNTGGFNPG SINTGWFNTG NANTGIANSG NVDTGALMSG NFSNGILWRG 2220 

NYEGLFSYSY SLDVPRITIL DAHFTGAFGP VWPPIPVLA INAHLTGNAA MGAFTIPQID 22 80 

55 I P ALNPNVTG SVGFGPIAVP SVTIPALTAA RAVLDMAASV GATSEIEPFI VWTSSGAIGP 2340 

TWYSVGRIYN AGDLFVGGNI ISGIPTLSTT GPVHAVFNAA SQAFNTPALN IHQIPLGFQV 24 00 

PGSIDAITLF PGGLTFPANS LLNLDVFVGT PGATIPAITF PEIPANADGE LYVIAGDIPL 2460 

INIPPTPGIG NTTTVPSSGF FNTGAGGGSG FGNFGANMSG WWNQAHTALA GAGS G I ANVG 2520 

TLHSGVLNLG SGLSGIYNTS TLPLGTPALV SGLGNVGDHL SGLLASNVGQ NPITIVNIGL 2580 

60 ANVGNGNVGL GNIGNLNLGA ANIGDVNLGF GNIGDVNLGF GNI GGGNVGF GNIGDANFGF 2640 

GNS GIiAAGLA GMGNIGLGNA GSGNVGWANM GLGNIGFGNT GTNNLGI GLT GDNQSGIGGL 2700 

NSGTGNIGLF NSGTGNIGFF NSGTANFGLF NSGSYNTGIG NSGVASTGLV NAGGFNTGVA 2760 

NAGS YNTGS F NAGD TNTGGF NPGSTNTGWF NTGNANTGVA NAGNVNTGAL ITGNFSNGIL 2820 

WRGNYEGLAG FSFGYPIPLF PAVGADVTGD IGPATIIPPI HIPSIPLGFA ' AIGHIGPISI 28 80 

65 PNIAIPSIHL GIDPTFDVGP ITVDPITLTI P GL S LD AAVS EIRMTSGSSS GFKVRPSFSF 2940 

FAVGPDGMPG GEVSILQPFT VAPINLNPTT LHFPGFTIPT GPIHIGLPLS LTIPGFTIPG 30 00 

GTLIPQLPLG LGLSGGTPPF DLPTWIDRI PVELHASTTI GPVSLPIFGF GGAPGFGNDT 3060 
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TAPSSGFFNT GGGGGSGFSN SGSGMSGVLN AISDPLLGSA SGFANFGTQL SGILNRGAGI 312 0 

SGVYNTGTLG LVTSAFVSGF MNVGQQLSGL LFAGTGP 3157 
<212> Type : PRT 
<211> Length : 3157 
5 SequenceName : SEQ ID 142 

SequenceDe script ion : 

Sequence 



10 <213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MSFWMPPEI NSLLIYTGAG PGPLLAAAAA WDELAAELGS AAAAFGSVTS GLVGGIWQGP 60 

SSVAMAAAAA P YAGWL S AAA ASAESAAGQA RAWGVFEAA LAETVDPFVI AANRSRLVSL 12 0 

, ALSNLFGQNT PAIAAAEFDY ELMWAQDVAA MLGYHTGASA AAEALAPFGS PLASIAAAAE . 180 

15 PAKSLAVNLG LANVGLFNAG SGNVGSYNVG AGNVGS YNVG GGNIGGNNVG LGNVGWGNFG 240 

LGNSGLTPGL MGLGNIGFGN AGSYNFGLAN MGVGNIGFAN TGSGNFGIGL TGDNLTGFGG 3 00 

FNTGSGNVGL FNSGTGNVGF FNSGTGNWGV FNSGSYNTGI GNSGIASTGL FNAGGFNTGV 3 60 

VNAGSYNTGS FNAGEANTGG FNPGSVNTGW LNTGDINTGV ANSGDVNTGA FISGNYSNGV 42 0 

LWRGDYQGLL GFSSGANVIiP VIPLSLDING GVGAITIEPI HILPDIPINI NETLYLGPLV 48 0 

20 VPPINVPAIS LGVGIPNISI GPIKINPITL WPAQNFNQTI TLAWPVSSIT IPQIQQVALS 540 

^PSPIPTTLIG PIHINTGFSI PVTFSYSTPA LTLFPVGLSI PTGGPLTLTL GVTAGTEAFT 60 0 

IPGFSIPEQP LPLAINVIGH INALSTPAIT IDNIPLNLHA IGGVGPVDIV GGNVPASPGF 660 

GNSTTAPSSG FFNTGAGGVS GFGNVGAHTS GWFNQSTQAM QVLPGTVSGY FNSGTLMSGI 72 0 

GNVGTQLSGM LSGGALGGNN FGLGNIGFDN VGFGNAGSSN FGLANMGIGN IGLANTGNGN 78 0 

25 IGIGLSGDNL TGFGGFNSGS ENVGLFNSGT GNVGFFNSGT GNLGVFNSGS HNTGFFLTGN 840 

NINVtiAPFTP GTLFTISEIP IDLQVIGGIG PIHVQPIDIP AFDIQITGGF IGIREFTLPE 90 0 

ITIPAIPIHV TGTVGLEGFH VNPAFVLFGQ TAMAE I TAD P WLPDPFITI DHYGP PLGPP 960 

GAKFPSGSFY LSISDLQING PIIGSYGGPG TIPGPFGATF NLSTSSLALF PAGLTVPDQT 102 0 

PVTVNLTGGL DSITLFPGGL AFPENPWSL TNFSVGTGGF TVFPQGFTVD RIPVDLHTTL 10 80 

30 SIGPFPFRWD YIPPTPANGP I PAVPGGFGL TSGLFPFHFT LNGGIGPISI PTTTWDALN 1140 

PLLTVTGNLE VGPFTVPDIP I PAINFGLD G NVNVSFNAPA TTLLSGLGIT GSIDISGIQI 1200 

TNIQTQPAQL FMSVGQTLFL FDFRDGIELN PIVIPGSSIP ITMAGLSIPL PTVSESIPLN 1260 

FSFGSPASTV KSMILHEII*P IDVSINLEDA VFIPATVLPA IPLNVDVTIP VGPINIPIIT 132 0 

EPGSGNSTTT TSDPFSGLAV PGLGV GLLGL FDGSIANNLI SGFNSAVGIV GPNVGLSNLG 1380 

35 GGNVGLGNVG DFNLGAGNVG GFNVGGGNIG GNNVGLGNVG FGNVGLANS G LTPGLMGLGN 1440 

IGFGNAGSYN FGLANMGVGN IGFANTGSGN FGI GLTGDNIi TGFGGFNTGS GNVGLFNSGT 1500 

GNVGFFNSGT GNWGVFNSGS YNTGI GNSGI ASTGLFNAGG FNTGWNAGS YNTGS FNAGQ 1560 

ANTGGFNPGS VNTGWLNTGD INTGVANSGD VNTGAFISGN YSNGAFWRGD YQGLiLGFSYR 162 0 

PAVLPQTPFL DLTLTGGLGS WIPAIDIPA IRPEFSANVA IDSFTVPSIP IPQIDLAATT 1680 

40 VSVGLGPITV PHLDIPRVPV TLNYLFGSQP GGPLKIGPIT GLFNTPIGLT PLALSQIVIG 174 0 

ASSSQGTITA FLANLPFSTP WTIDEIPLL ASITGHSEPV DIFPGGLTIP AMNPLSINLS 18 00 

GGTGAVT I PA ITIGEIPFDL VAHSTLGPVH ILIDLPAVPG FGNTTGAPSS GFFNSGAGGV 18 60 

SGFGNVGAMV S GGWNQAP S A LLGGGS GVFN AGTLHSGVLN FGSGMSGLFN TSVLGLGAPA 192 0 

LVSGLGSVGQ QLSGLLASGT ALHQGLVLNF GLADVGLGNV GLGNVGDFNIi GAGNVGGFNV 1980 

45 GGGN I GGNNV GLGNVGWGNF GLGNSGLTPG LMGLGNIGFG . NAGSYNFGLA NMGVGN I GFA 2040 

NTGSGNFGIG LTGDNLTGFG GFNTGS GNVG LFNSGTGNVG FFNSGTGNWG VFNSGSYNTG 2100 

IGNSGIASTG LFNAGGFNTG WNAGSYNTG SFNAGQANTG GFNPGSVNTG WLNTGDINTG 2160 

VANS GDVNTG AFISGNYSNG AFWRGDYQGL LGFSYTSTII PEFTVANIHA SGGAGPIIVP 222 0 

SIQFPAIPLD LSATGHIGGF TIPPVSISPI TVRIDPVFDL GPITVQDITI PALGLDPATG 2280 

50 VTVGPIFSSG SIIDPFSLTL LGFINVNVPA IQTAPSEILP FTVLLSSLGV THLTPEITIP 2340 

GFHIPVDPIH VELPLSVTIG PFVSPEITIP QLPLGLALSG ATPAFAFPLE ITIDRIPWL 2400 

DVNALLGPIN AGLVIPPVPG FGNTTAVPSS GFFNIGGGGG LSGFHNLGAG MS GVLNA I S D 2460 

PLLGSAS GFA NFGTQLSGIL NRGAD I SGVY NTGALGL ITS ALVSGFGNVG QQLAGLIYTG 252 0 

TGP 2523 

55 <212> Type : PRT 

<211> Length : 2523 

SequenceName : SEQ ID 143 
SequenceDescription : 

60 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MSFVIAVPEA LTMAASDLAN IGSTINAANA AAALPTTGW AAAADEVSAA VAALFGSYAQ 60 

65 SYQAFGAQLS AFHAQFVQSL TNGARSYWA EATSAAPLQD LLGWNAPAQ ALLGRPLIGN 120 

GANGADGTGA PGGPGGLLLG NGGNGGS GAP GQP GGAGGDA GLIGNGGTGG KGGDGLVGSG 180 

AAGGVGGRGG WLLGNGGTGG AGGAAGATLV GGTGGVGGAT GLIGSGGFGG AGGAAAGVGT 240 
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TGGVGGS GGV 
LLIGNGGVGG 
FGSGGAGGQG 
GVGGQGGLGE 
DGATGGVDGG 
GRGGMLIGNG 
VGGTGGMGGS 
GGGATIGGGG 
LGGQGGNGGN 



GGVFGNGGFG 
LGGAGAAGGN 
GTGLAGTNGV 
SLDGNDGTGG 
VGGAGGKGGQ 
GAGGAGGTGG 
GGVGGNGGAA 
GTGGVGGAGG 
GGTGATGGQG 



GAGGLGAAGG 
GGAGGMLLGD 
NPGSIANPNT 
KGGAGGTAGT 
GHNTGVGDAF 
TGGGGAAGFA 
GSLIGLGGGG 
TGGTGGAGGT 
GDFALGGNGG 



VGGAASYFGT 
GGAGGQGGPA 
GANGTDNSGN 
DGGAGGAGGA 
GGDGGIGGDG 
GGVGGAGGEG 
GAGGVGGTGG 
TGGSGGAGGL 
AGGAGGS PGG 



GGGGGVGGDG APGGDGGAGP 
VAGVLGGMPG AGGNGGHANW 
GNQTGGNGGP GPAGGVGEAG 
GGI GETDGS A GGVATGGEGG 
NGALGAAGGN GGTGGAGGNG 
LTDGAGTAEG GTGGLGGLGG 
IGGIGGAGGN GGAGGAGTTT 
IGWAGAAGGT GAGGTGGQGG 
SSGIQGNMGP PGTQGADG 



300 
360 
420 
480 
540 
600 
660 
720 
778 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



<212> Type : PRT 

<211> Length : 778 

SequenceName : SEQ ID 144 
SequenceDe script ion, : 

Sequence 



<213> Organi smName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

PQGADGNAGN GGDGGVGGNG GNGADNTTTA AAGTTGGAGG AGGAGGTGGT GGAAGTGTGG 60 

QQGNGGNGGN GGTGGKGGTG GDGALAGS S G GAGGKGGNGG DAGKAGTGSA PGTAGTGGDG 120 

GKGGNGG I GA AGTTGPVGTG ASGGTGGSGG AGGTGGDGGA ANGGTAGAGG AGGNGGKGGD 18 0 

GGAGVTSSTA GNSGGAGGSG GKGGDAGAGG AGATPGANGI AGNGGD GGDG AAGAVGISGA 24 0 

TGAGDGGHGG TGAAGGNGGT GGAGGSGIDG VGGGTGGTGG NGGNGAIGGA GGDAGGSGNS 3 00 

GGNGGIGGKG GNAGAGGAAG SNGGTVGANG TGGDGGNGGA AGAATAGSNG GAGTGSAGGN 3 60 

GGTGGRGGSG GAGGDGI GGV GGGKGGNGAD GEVGGAGGAG GSGPNTSPGG NGGQGGQGGS 42 0 

GGAGGAAGAG GAGGGANGTA GNGGQGGAGG TGGAGAAS S A TNGGS GGAGG TGGDGGSGGA 480 

GGTGGAGGTG GAAGDGGQGG QGGAGGGAGG QGGAGGAGGT GGNGGNITGG TAGTAGAAGN" , 54 0 

GGAAGKGGAG GQGGTGGGTG GQGGAGGDGG AGGTGGDRTV GGGTVPAGSG GQGGNAGGGG 600 

AGGQGGADGG SGGDGGDAGT GGNGGNGGNR NSGNGTGGAG GNGGGGANGG AGGAGGS GGG 660 

TGGNGGAGGD AGDAGNGGNG NGTGNGGNGG NGGIAGMGGN GGAGTGSGNG GNGGSGGNGG 720 

NAGMGGNSGT GSGDGGAGGN GGAAGTGGTG GDGGLTGTGG TGGSGGTGGD GGNGGNGADN 78 0 

TANMTAQAGG DGGNGGDGGF GGGAGAGGGG LTAGANGTGG QGGAGGDGGN GAI GGHGPLT 84 0 

DDPGGNGGTG GNGGTGGTGG AGIGSLGGGT GGDGGNGGNG GTGGEGGEVG GAGGTGGAAG 90 0 

NGGDGGTGGT GGGDGGAGGT GGTGGTGGLG DPRVGGSGGD GGTGGS GGAA GNGGNGGNAG 9 60 

AGGNGNGGTG GAGGXGGTGG NGGDAEPGVP PGAGGAGGAG TTGGKGGTGG NGSGTGSGGT 1020 

GGDGGTGGGG GNGGTGWNGG KGDTGSGGGA GDGGKAPAGG TGGAGGD GGA GGKGGS GGV 1079 

<212> Type : PRT 
<211> Length : 1079 

SequenceName : SEQ ID 145 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MVMSLMVAPE LVAAAAADLT GIGQAISAAN AAAAGP TTQV LAAAGD EVS A AIAALFGTHA 60 

QEYQALSARV AT FHEQF VRS LTAAGSAYAT AEAANASPLQ ALEQQVLGAI NAPTQLWLGR 12 0 

PLIGDGVHGA PGTGQPGGAG GLLWGNGGNG GSGAAGQVGG PGGAAGL FGN GGS GGSGGAG 180 

AAGGVGGSGG WLNGNGGAGG AGGTGANGGA GGNAWL FGAG GSGGAGTNGG VGGSGGFVYG 240 

NGGAGGI GGI GGI GGNGGDA GLFGNGGAGG AGAAGLPGAA GLNGGDGSDG GNGGTGGNGG 3 00 

RGGLLVGNGG AGGAGGVGGD GGKGGAGDPS FAVNNGAGGN GGHGGNPGVG GAGGAGGLLA 3 60 

GAHGAAGATP TSGGNGGDGG I GATANS PL Q AGGAGGNGGH GGLVGNGGTG GAGGAGHAGS 420 

TGATGTALQP TGGNGTNGGA GGHGGNGGNG GAQHGD GGVG GKGGAGGSGG AGGNGFDAAT 480 

LGS PGADGGM GGNGGKGGDG GKAGDGGAGA AGDVTLAVNQ GAGGDGGNGG EVGVGGKGGA 540 

GGVSANPALN GSAGANGTAP TSGGNGGNGG AGATPTVAGE NGGAGGNGGH GGSVGNGGAG 600 

GAGGNGVAGT GLALNGGNGG NGGIGGNGGS AAGTGGD GGK GGNGGAGANG QDFSASANGA 660 

NGGQGGNGGN GGI GGKGGDA FATFAKAGNG GAGGNGGNVG VAGQGGAGGK GAI PAMKGAT 720 

GADGTAPTSG GDGGNGGNGA SPTVAGGNGG DGGKGGSGGN VGNGGNGGAG GNGAAGQAGT 780 

PGPTSGDSGT SGTDGGAGGN GGAGGAGGTL AGHGGNGGKG GNGGQGGIGG AGERGADGAG 840 

PNANGANGEN GGSGGNGGDG GAGGNGGAGG KAQAAGYTDG ATGTGGD GGN GGDGGKAGDG 90 0 

GAGENGLNSG AMLPGGGTVG NPGTGGNGGN GGNAGVGGTG GKAGTGSLTG LDGTDGITPN 960 

GGNGGNGGNG GKGGTAGNGS GAAGGNGGNG GSGLNGGDAG NGGNGGGALN QAGFFGTGGK 1020 

GGNGGNGGAG MINGGLGGFG GAGGGGAVDV AATTGGAGGN GGAGGF AS TG LGGPGGAGGP 10 8 0 

GGAGDFASGV GGVGGAGGDG GAGGVGGFGG QGGIGGEGRT GGNGGSGGDG GGGISLGGNG 1140 

GLGGNGGVSE TGFGGAGGNG GYGGPGGPEG NGGLGGNGGA GGNGGVSTTG GDGGAGGKGG 1200 
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MGGD GGNVGL GGDAGSGGAG GNGGIGTDAG GAGGAGGAGG NGGSSKSTTT GNAGS GGAGG 12 60 

NGGTGLNGAG GAGGAGGNAG VAGVS FGNAV GGDGGNGGNG GHGGDGTTGG AGGKGGNGSS 132 0 

GAASGSGWN VTAGHGGNGG NGGNGGNGSA GAGGQGGAGG SAGNGGHGGG ATGGD GGNGG 13 8 0 

NGGNS GNS TG VAGLAGGAAG AGGNGGGTSS AAGHGGSGGS GGS GTTGGAG AAGGNGGAGA 1440 

5 GGGSLSTGQS GGPRRQRWCR WQRRRWLGRQ RRRRWCRWQR RCRRQRWRWR CRQRRLRRQW 1500 

RQGRRRCRPW LHRRRGRQGR RWRQRRFQQR QRSRWQRR 153 8 
<212> Type : PRT 
<211> Length : 153 8 

SequenceName : SEQ ID 146 
10 SequenceDe script ion : 

Sequence 



<213> OrganismName ; Mycobacterium tuberculosis H37Rv- .... ■ 

15 c400> PreSequenceString : 

MSFWTAPPV LASAASDLGG IASMISEANA MAAVRTTALA PAAADEVSAA IAALFSSYAR 60 

DYQTLSVQVT AFHVQFAQTL TNAGQLYAW DVGNGVLLKT EQQVLGVINA PTQTLVGRPL 12 0 

IGDGTHGAPG TGQNGGAGGI LWGNGGNGGS GAPGQPGGRG GDAGLFGHGG HGGVGGP GI A 180 

GAAGTAGLPG GNGAMGGSGG IGGAGGAGGN GGLLFGNGGA GGQGGSGGLG GSGGT GGAGM 240 

20 AAGPAGGTGG IGGIGGIGGA GGVGGHGSAL FGHGGINGDG GTGGMGGQGG AGGNGWAAEG 3 00 

I TVGIGEQGG QGGDGGAGGA GGIGGSAGGI GGS QGAGGHG GDGGQGGAGG SGGVGGGGAG 3 60 

AGGDGGAGGI GGTGGNGS I G GAAGNGGNGG RGGAGGMATA GSDGGNGGGG GNGGVGVGSA 420 

GGAGGTGGDG GAAGAGGAPG HGYFQQPAPQ GLPIGTGGTG GEGGAGGAGG DGGQGDIGFD 480 

GGRGGDGGPG GGGGAGGDGS GTFNAQANNG GDGGAGGVGG AGGTGGTGGV GADGGRGGDS 540 

25 GRGGDGGNAG HGGAAQFSGR GAYGGEGGSG GAGGNAGGAG TGGTAGS GGA GGFGGNGADG 60 0 

GISTGGNGGNGG FGGINGTFGT NGAGGTGGLG TLLGGHNGNI GLNGATGGIG STTLTNATVP 660 

LQLVNTTEPV VFISLNGGQM VP VLLDTGS T GLVMDSQFLT QNFGPVIGTG TAGYAGGLTY 720 

MYNTYSTTVD FGNGLLTLPT SVNWTSSSP GTLGNFLSRS GAVGVLGIGP NNGFPGTSSI 780 

VTAMPGLLNN GVLIDESAGI LQFGPNTLTG GITISGAPIS TVAVQIDNGP LQQAPVMFDS 840 

30 GGINGTIPSA LASLPSGGFV PAGTTISVYT SDGQTLLYSY TTTATNTPFV TSGGVMNTGH 9 00 

VT?FAQQPIYV SYSPTAIGTT TFN 923 
<212> Type - PRT 
<211> Length z 923 

SequenceName r SEQ ID 147 



35 SequenceDescription : 

Sequence 



<213> OrganismName r Mycobacterium tuberculosis H37Rv 

40 <400> PreSequenceString : 

MX GNGGAGGS GAP GAI GGAG GPAGLIGVGG AGGAGGDSAV AGVI GGAGGA GGAALLFGAG 60 

GAGGAGGSGG SGAAGGAGGA GGAGGLFASG GSGGFGGFAS TGTGGAGGTG GAGGLFASGG 120 

VGGTGGGAGS GGTGGVGGTG GAGGL FASGG AGGAGGS GGT GGAGGTGGAG GLFGAGGAGG 180 

LGGQGNHTGG HGGAGGSAGL LALGDGGAGG AGGAATTGTG GAGGAGGKAG LLFGSGGAGG 240 

45 SGGAAGTFGD TGNS GGAGGA GGKAGLLFGS GGAGGSGGAG GFANGSTGGA GGAGGGAGL I 30 0 

G1STGGNGGS GG TSVATGGAGN GGAGGAGGGA GLIGNGGNGG SGGMGDAPGG TGVGGIGGLL 3 60 

LGLDGANAPA STNPLHTAQQ QALAAVNAPI QAVTGRPLIG NGANGAPGSG APGGHGGWLF 420 

GGGGTGGSGV S GGAGGD GGA GGILFGAGGA GGAGGAVTGT GATGGS GGAG GGALL FGAGG 480 

AGGAGGS S GI GGFAAGGAGG PGGAGGLFNG GGAGGAGGSG VSGGAGGEGG AGGAGGLFAG 540 

50 GGAGGAGGSG NNVGGAGGAG GVGGLFGAGG AGGSGGGGSV AGDSGAGGNA GLLAPGLAGG 600 

AGGGGGQGFD TGGAGGPGGD AGLLVGS GGV GGAGGFGLTT GGPGAAGGDA GLLFGSGGAG 660 

GAGGSGRTDL GGAGGAGGKA GLIGNGGNGG AGGAGGNGGG DGGPGGAAFG LGNGGNGGNG 720 

GTGT SAG SPG AGGAGGS L I G AEGLPGLLP 749 
<212> Type : PRT 

55 <211> Length : 749 

SequenceName : SEQ ID 148 
SequenceDescription : 



Sequence 
60 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MSFVIAAPEA LVAVASDLAG I GS ALAEANA AALAPTTALL AAGADEVSAA IAALFGAHGQ 60 
AYQTVSAQAS AFHAQFVQAL TGGGGAYAAA EAANVSAAQS TDQRLLDLIN GPTQALLGRP 120 
65 LXGDGANGGP GQDGGPGGLL YGNGGNGGTS TTAGVAGGNG GAAGL I GNGG AGGGGGAGAA 180 
GGNGGAGGWL YGNGGAGGAG GTSVIPGVAG GNGGAGGSAG LWGTGGAGGD GGNGRSGPVN 240 
VAGSAGGNGG AGGAAGL FGD AGAGGNGGKG GAGGAAFSIN FTAGDGGAGG AGGSGGHALL 3 00 
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WGAGGAGGNG GSGGTGGAGG STAGAGGNGG AGGGGGTGGL LFGNGGAGGH GAAAGNGLAA 360 

GNGVSSSGGG GAGGTGGAGG DGGAGGAGGN ARLWGVGGAG GAGGD GGAGG AGGKGGSGLS 420 

GNANGGAGGD SGRGGTGGAG GEGGAAGLLV GTGGHGGDGG AGGAAVKGGD GGAAAGTGIA 48 0 

GAGGRGGAGG SGGSGGDGGG GAAGPAGWLF GDGGAGGNGG AAAAGGAGGQ AGGGGGNGGN 540 

5 GGNGGNGGNG GNGATGGWLY GNGGAGGQGA TAGAGGAGAN GVS STNGGGT GGNGGIGGTG 60 0 

GSGGAGGNAG LLGVGGAGGH GAS GGAGDRG GAGGTGF I S S DGGAGGDGGD GGNGGAGGTG 660 

GLLFGAGGNG GPGGS GGAAD I GGNGGAGNG GGTDGNGGNG GSGGGAGSGG DGGGAGGNGA 720 

WLFGNGGAGG GGGKGGNGAG GGLGGGSFGL PGLNGSGGDG GDGGNGAPGG VLYGNGGAGG 780 

QGSSGGIGGP GATGGAGGKG GDGGDAQL I G DGGNGGNGGA GGTGGTPGPG GPGGS GGLGG 840 

10 LLFGQTGTAG VSP 853 



<212> Type : PRT 

<211> Length : 853 

SequenceName : SEQ ID 149 
SequenceDescription : 

15 

Sequence 



<213> OrganisniNarae : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

20 MS YLWVPEL VAAAATDLAN IGSSISAANA AAAAPTTALV AAGGDEVSAA IAALFGAHAR 60 

AYQALSAQAA MFHEQFVRAL AAGGNS YAVA EAATAQSVQQ DLLNL INAPT QALLGRPLIG 120 

NGANGLPGTG QNGGDGGILY GNGGNGGSGG VNQAGGNGGN AGLWGNGGSG GAGGNATTAG 180 

RNGFNGGAGG SGGLLWGNGG AGGAGGNGGP APLVGGVGTT GGAGGNGGGA GLFYGFGGAG 24 0 

GNGGMGGVAP STGPSMGILP AGGVGGPGGS GGASALAFGS GGVGGAGGLG GPTDGTVQGV 3 00 

25 GGFGGQGGNG GQSGLLFGMA GAGGAGAAGG AGTGDTESFG GHGGAGGDGG AVGL I GNGGA 3 60 

GGTGSPGAW GGNGGVGGLG GAGSPGGLLY GTGGAGGNGG PGGDGGTGAT VGFAGS GGFG 420 

GAGGIAQLFG TGGMGGSGGG I GAGTTTWP PDVAPVGGTG GNGGRAGLLL GV GGMGGNGG 480 

ATSVGGTLYA AGGMGGD GGIi VWGNGGTGGS GGAGGAGSVG NGGAGGNAAL LFGNGGAGGA 540 

GGAGGIGAGG AGGFGAVIiFG MGGAGGSGAP GGI GAGGNGG MALLVGNGGN GGAGTGGAAG 600 

30 GAGGS GGLIiF GQNGMPGP 618 



<212> Type : PRT 

<211> Length : 618 

SequenceName t SEQ ID 150 
SequenceDescription : 

35 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

4-0 MNFSVLPPEI NSALIFAGAG PEPMAAAATA WDGLAMELAS AAASFGSVTS GLVGGAWQGA 60 

SSSAMAAAAA PYAAWLAAAA VQAEQTAAQA AAMIAEFEAV KTAWQPMLV AANRADLVSL 120 

VMSNLFGQNA PAIAAIEATY E QMWAADVS A M S A YHAGAS A IASALSPFSK PLQNLAGLPA 18 0 

WLAS GAP AAA MTAAAGI PAL AGGPTAINLG IANVGGGNVG NANNGLANIG NANLGNYNFG 240 

SGNFGNSNIG SASLGNNNIG FGNLGSNNVG VGNLGNLNTG FANTGLGNFG FGNTGNNNIG 3 00 

45 IGLTGNNQIG IGGLNSGTGN FGLFNSGSGN VGFFNSGNGN FGIGNSGNFN TGGWNSGHGN 3 60 

TGFFNAGS FN TGMLDVGNAN TGSLNTGSYN MGDFNPGSSN TGTFNTGNAN TGFLNAGNIN 420 

TGVFNI GHMN NGLFNTGDMN NGVFYRGVGQ GSLQFSITTP DLTLPPLQIP GISVPAFSLP 480 

AITLPSLNIP AATTPANITV GAFSLPGLTL PSLNIPAATT PAN I TVGAF S LPGLTLPSLN 540 

IPAATTPANI TVGAFSLPGL TLPSLNIPAA TTPANITVGA FSLPGLTLPS LNIPAATTPA 600 

50 NITVGAFSLP GLTLPSLNIP AATTPANITV SGFQLPPLSI PSVAIPPVTV P P I TVGAFNL 660 

PPLQIPEVTI PQLTIPAGIT IGGFSLPAIH TQPITVGQIG VGQFGLPSIG WDVFLSTPRI 720 

TVPAFGI PFT LQFQTNVPAL QPPGGGLSTF TNGALIFGEF DLPQLWHPY TLTGPIVIGS 780 

FFLPAFNIPG IDVPAINVDG FTLPQITTPA ITTPEFAIPP IGVGGFTLPQ ITTQEIITPE 840 

LT INS IGVGG FTLPQITTPP ITTPPLTIDP INLTGFTLPQ ITTPPITTPP LTIDPINLTG 900 

55 FTLPQITTPP ITTPPLTIEP IGVGGFTTPP LTVPGIHLPS TTIGAFAIPG GPGYFNSSTA 960 

PS SGFFNSGA GGNS GFGNNG SGLSGWFNTN PAGLLGGSGY QNFGGLSSGF SNLGSGVSGF 1020 

ANRGILPFSV ASWSGFANI GTNLAGFFQG TTS 1053 
<212> Type : PRT 
<211> Length : 1053 

60 SequenceName : SEQ ID 151 

SequenceDescription : 



Sequence 



65 <213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MLYWASPDL MTAAATNLAE IGSAISTANG AAALPTVEW AAAADEVSTQ IAALFGAHAR 60 
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SYQTLSTQAA AFHSRFVQAL TTAAASYASV 
GADGSTPGQA GGPGGLLYGN GGNGAAGGPN 
GTGGLLFGNG GAGGQGGLGL AGINGGSGGQ 
PTPIGTAAPG SDGVNQIGNG GNTDLTGGAG 
5 SFGGAGGAGG DGANGGD GGA GGEALTEGGA 
ATTSVTGGNG GNGGNGHDSN APGGAGGSGG 
APGGAGGAGG KADIANSLGD NATVTGGNGG 
IGNGGAGGAG GLGGAGGAGG AGGEGGAGGA 
GAGGAPGLGG AGGAGGWLIG QSGSTGGGGA 
10 SSGTAGFDGN PGQPG 
<212> Type : PRT 
<211> Length : 615 

SequenceName : SEQ ID 152 
SequenceDeseription : 

15 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<40 0> PreSequenceString : 

20 MHYSVLPPEI NSALIFAGAG SGPMLAAASA WDGLATELAS AAVSFGSVTA GLVGGSWQGR 60 

SSVAMAAAAA PYAGWLAAAA TQAEQAATQA QVMVAEFEAV RLAMVQPALV AANRSGLISL 12 0 

VI SNLFGQNA PAIAAAEAAY EEMWALDVSA MAAYHSGASA VAVALPAFAL PLRLPAGLAA 180 

GPAAWTALT TAVGMP T FAG RAIAASLGLA NVGGGNLGNA NNGLGNIGNA NLGNNNLGSG 240 

NFGSFNIGSA NLGGNNIGIG NAGANNFGLA NLGNLNTGFA NAGIGNFGIA NTGNNNIGNG 3 00 

25 LTGNNQIGIG GLNSGNGNVG LFNAGSANIG FFNSGNGNFG IGNSGNFSTG LFNPGHGNTG 3 60 

FLNAGSFNTG MFDVGNANTG SFNVGHYNFG AFNPGPSNTG TFNTGGANTG WFNTGSINTG 420 

AFNIGDMNNG LFNTGDMNNG VFYRGVGQGS LQFAITSPDL TLPSLEIPGI SVPAFSLPAI 480 

TlaPSIiTIPAV TTPANVTVGA FDLPGIiTVPS LTIPAAMTPA NITVGAFDLP GLTVPSLTIP 540 

ATTTPANITV GAFNLPQLSI PSVTVPPITI PAGTALGAFN LPTLSIPSVT VPPITIPAGT 600 

30 TVGGFTLPTI HTPLISTPQI SIGGFSTPGI ATQANSGVIN LPTFSIiNGIT ITNLWFIPN 660 

NITALQTNMP GVFPQIGGFA NTPPAFINTG TITVGGGQIN GVGFSIGAIN VTPFTLPNW 720 

IQPWSLGGIS VDGFTLPEIS TQEFTTPALT ISPIGVGALS LPDITTQQFT TPELTIDPIT 780 

LGGFTLPQLS IPAITTPAFT IDPIALGGFT LPQIMTPEIT TPPFAIDPIG LSGFTLPQVN 840 

IPEITTPEFT XQPVGLAAFT TPALTIASIH LPSTTMGGFA IPAGPGYFNS SATPSLGFFN 900 

35 AGIGGNSGFG NSGSGLSGWF NTSPVGLLAG S GYQNYGGL I SGFSNLGSGI SGFANTGTLP 960 

FAVTSLVSGL ANIGNNLSGL FFQSTTP 987 
<212> Type z PRT 
<211> Length : 987 

SequencelSTame : SEQ ID 153 

40 SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
45 <400> PreSequenceString : 

MSFVWAPEV LAAAASDLAG I GS TLAQANA 

AYQAVSAQMS AFHAQFMQAL TGAGGAYAAA 

L I GDGANGGP GQDGGPGGLL YGNGGNGGTS 

GGNGGAGGWL FGNGGAGGAG GLGVAPGVPG 
50 VAGAGGFEGT I GAGGAGGVG GAGGVGGAGG 

GAGGAGGAGG VGGAGGAAGL WGGGGAGGVG 

GAGGAGGAGG TGGWLYGGGG AAGSGGDGGT 

GGVGGAGGRA GLFGVGGLGG AGGDAGDSGE 

GGAGGAGGNG GAGGNGGWLF GNGGAGGSGG 
55 VG I GGAGGAG GTAGLFGDGG AGGAGGAGAA 

LLGTGGAGGV GGGGGAGGDG GRGGVATPGG 

GGAGGAGGNG GNGGKAGFSP GPTNFGLNGA 

AAGGHGGDAQ L I GNGGHGGA GGTGVPNGSG 

<212> Type : PRT 
60 <211> Length : 767 

SequenceName : SEQ ID 154 
SequenceDe script ion : 

Sequence 

65 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 



EAANAS PLQV ALDVINAPAQ 
QAGGAGGNAG L I GNGGAGGA 
GGHGGNAILF GQGGAGGPGG 
GDGNAGS TTV NGGNGGTGGA 
TAVSGAGGKG GNAEASGGAG 
VGGD GGRGGL LAGNGGTGGA 
TGGDGGSALG TGGAGGAGGL 
GGEAIPGGAS TNSAGGDGGA 
GGAGGAGGAG GAGGSGGAGG 



TLLGRPL I GN 120 

GGVGAVGGKR 18 0 

TGAMGVAGTN 24 0 

ARNSSGGTGN 3 00 

GNGGKGGFAQ 3 60 

GGNGGTGGAG 42 0 

GGHGGAGGLL 48 0 

GGTGGNGGDG 54 0 

HGDTTSGKNG 600 
615 



AALAPTTAVL AAGADEVSAA IASLFGAHGQ 60 

EAVNVSAAQS VEQDLLAAIN ARFERI FGRP 12 0 

TTVGMAGGNG GAAGL I GNGG FGGGGGPGAA 180 

GAGGAGGAGG VGGPAGLWGH GGAGGAGGAG 240 

AGGWLYGDAG AGGDGGVGGA GGTGGLGNRG 30 0 

GTGGGAGLGA QSVTFSSSLS GLS GGDGGAG 3 60 

GGQ GGAGGAG VFSLFGSGGG PGGNGGVGGV 420 

GGFGGP GLAG GLFGNPGNGG VGGIGGDAAA 480 

D GGAAGRGGA GNLGSAGGIN APAGNPGSGS 540 

GGFGGISAAT PSAGSEGAMG GAGGVGGNAR 600 

QGGDAGDGGA GGAGGNGGGA SGAGGWLLGT 660 

GGGGGVGGNG ATGPWLFGDG GPTPGSTGAG 72 0 

GAGGLSGLLF GEPGANG 767 
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MSFVIANPEM LAAAATDLAG IRSAISAATA AAAAPTIQVA AAGADEVS LA ISALFGQHAQ 60 

AYQALSAQAT TFHDQFVQAL TSGGNLYAAA ESHTVEQMVL NAINAPTQTL FGRPLIGDGA 12 0 

NGTAENPDGQ ISTGGLLFGNGG NGFTQTTAGV AGGNGGSAGL IGNGGAGGGG GAGAAGGLGG 18 0 

NGGWLYGNGG AGGIGGAGTG TGGHGGAGGA GGRAWLWGTG GAGGAGGDGG WLFGDGGAGG 240 

5 TGGNGGS GFN SLTSSVGGAG GAGGHAGLFG AGGTGGTGG I GGQNTETGPA ASNGGAGGAG 3 00 

GGGGYLVGDG GAGGTGGAGG KNSSGGATLT GGTGGTGGAG GAAGWLYGSG GAGGAGGAGG 360 

LNNAGGATGG TGGTGGAGGS GAWLYGNGGA AGAGGNGGNN TSAGTGGVGA SGGTGGNAGL 42 0 

I GAGGHGGAG GAGGNQTGGV GNGGAGGNGG AGGAGGQLYG NGGDGGNGGA GGANIAGGNG 48 0 

SDGGAAGHGG 2\GGS ARL IGA GGHGGDGGAG GNTAGRRADA IAGTGGDGGN GGNGGLLSGN 540 

10 AGAGGHGGAG GSSTATTTTG TPPTGATGGN GGNGGAGGTA GFTGSGGIGG NGGAGGTGGN 600 

AGVALSVGST GGLGGNGGSG GLGGGGGSLF GNGGAGGVGA TGGNGGS G I G PASVGGNGGK 660 

GGVGAAGGLA GQIGNGGSGG SGGAGGNGGT GDTAGNGGNG GAGAVGGNAQ LIGNGGNGGG 72 0 

GGNGGTGADG T 731 
<212> Type : PRT 

15 <211> Length : 731 

SequericeName : SEQ ID 155 
SequericeDescription : 



Sequence 
20 

<213> Organi smName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MPGRFRNFGS QNLGSGNIGS TNVGSGNIGS TNVGSGNIGD TNFGNGNNGN FNFGSGNTGS 60 
NNIGFGNTGS GNFGFGNTGN NNIGIGLTGD GQIGIGGLNS GSGNIGFGNS GTGNVGLFNS 12 0 

25 GTGNVGFGNS GTANTGFGNA GNVNTGFWNG GSTNTGLANA GAGNTGFFDA GNYNF GSLNA 18 0 

GNINSSFGNS GDGNSGFLNA GDVNSGVGNA GDVNTGLGNS GNINTGGFNP GTLNTGFFSA 240 
MTQAGPNSGF KNAGTGNSGF GHNDPAGSGN SGIQNSGFGN SGYVNTSTTS MFGGNSGVLN 3 00 

TGYGNSGFYN AAVNNTGIFV TGVMSSGFFN FGTGNSGLLV SGNGLSGFFK NLFG 3 54 

30 <212> Type = PRT 

<211> Lengtri : 354 

SequericeName : SEQ ID 156 
SequericeDescription : 

35 Sequence 



<213> Organi smName : Mycobacterium tuberculosis H37Rv 
<400> PreSecxuenceString t 

MSFVLAMPEV XiGSAATDLAA LGSVLGAADA AAAATTTGIV AAAQDEVSAA IAALFSAHGR 60 

40 AYQVASAQAA AVHAQFVEAL SAGAGAYASA EAAGAAVLAN PAQSVQQDLL AAVNAQSVAL 120 

TGRPL I GNGA isTGAPGTGANG APGGWLLGNG GAGGSAAAGS GLPGGAGGAA GLFGTGGAGG 180 

AGGSSTVGDG ELAGGAGGSGG WLLGTGGVGG VGGLGAGAGG AGGVGGAGGL LGAGGHGGAG 240 

GLGAVTGGVG GTGGAGGLLA GLLAGPGGAG GTGGRGFLNN GGVGGAGGNA GLL FGAGGTG 3 00 

GSGGAGLGGD GGAGGAGGNT GVLFGNAGSG GTGGFGDTDG GAGGAGGDAG WLGSGGVGGA 3 60 

45 GGFGETGDGG VGGAGGKAGL LIGNGGAGGA GGQGAVTGGT GGAGGD GVL I GNGGNAGIGG 420 

TGP TAGDTGA GGISGLLLGA DGFNTPASAS PLHTLKQQAL AAINAPTQTL TGRPL IGNGT 480 

PGAVGSGATG APGGWLLGDG GAGGS GAAGS GAP GGAGGAA GLWGTGGAGG AGGSSAGGGG 540 

AGGAGGAGGW LLGDGGAGGI GGASTVLGGT GGGGGVGGLW GAGGAGGAGG TGLVGGDGGA 600 

GGAGGTGGLL AGLIGAGGGH GGTGGLSTNG DGGVGGAGGN AGMLAGP GGA GGAGGDGENL 660 

50 DTGGDGGAGG SAGLLFGSGG AGGAGGFGFL GGDGGAGGNA GLLLSSGGAG GFGGFGTAGG 720 

VGGAGGNAGW LGFGGAGGVG GSAGLI GTGG NGGNGGTGAN AGS PGTGGAG GLLLGQNGLN 780 

GLP 783 
<212> Type = PRT 
<211> Lengtri : 783 

55 SequericeName : SEQ ID 157 

SequericeDescription : 



Sequence 



60 <213> Organi smName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MSLVIATPQL LATAALDLAS I GSQVS AANA AAAMPTTEW AAAADEVSAA I AGL F GAHAR 60 

QYQALSVQVA AFHEQFVQAL TAAAGRYAST EAAVERSLLG AVNAPTEALL GRPL I GNGAD 120 

GTAPGQPGAA GGLLFGNGGN GAAGGFGQTG GSGGAAGLIG NGGNGGAGGT GAAGGAGGNG 180 

65 GWLWGNGGNG GVGGTSVAAG IGGAGGNGGN AGL FGHGGAG GTGGAGLAGA NGVNP TP GPA 240 

ASTGDSPADV SGIGDQTGGD GGTGGHGTAG TPTGGTGGDG ATATAGSGKA TGGAGGDGGT 30 0 

AAAGGGGGNG GDGGVAQGD I AS AF GGDGGN GSDGVAAGSG GGS GGAGGGA FVHIATATST 360 
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GGSGGFGGNG AASAASGADG GAGGAGGNGG AGGIiLFGDGG NGGAGGAGGI GGDGATGGPG 42 0 

GSGGNAGIAR FDSPDPEAEP DWGGKGGDG GKGGSGLGVG GAGGTGGAGG NGGAGGLLFG 480 

MGGNGGNAGA GGD GGAGVAG GVGGNGGGGG TAT FHEDPVA GWAVGGVGG DGGSGGSSLG 540 

VGGVGGAGGV GGKGGAS GML IGNGGNGGSG GVGGAGGVGG AGGDGGNGGS GGNAS TFGDE 600 

5 NSIGGAGGTG GNGGNGANGG NGGAGG I AGG AGGSGGFLSG AAGVS GADGI GGAGGAGGAG 660 

GAGGS GGEAG AGGLTNGPGS PGVSGTEGMA GAPG 694 
<212> Type : PRT 
<21X> Length : 694 

SequenceName : SEQ ID 158 
10 SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
15 <40 0> PreSequenceString : 

MSFVIAAPEV IAAAATDLAS LESSIAAANA 

AYQALSAQAQ AFHAQFVQAL TSGGGAYAAA 

GAPGTGANGG DGGWLIGNGG AGGSGAAGVN 

GAGGAGGAAG MLFGAAGVGG PGGFAAAFGA 
20 GGAGGSGGNG GLF GAGGTGG PGGFGI FGGG 

GGAGGDAGML SLGAAGGAGG SGGSNPDGGG 

AGGAGGKAGL LIGAGGAGGA GGGS FAGAGG 

GGAGGSGVL I GNGGNGGSGG TGAPAGTAGA 

AINEPTQALT GRPLIGNGAN GTPGTGADGG 
25 GIIiSGIGGTG GSGGIGTTGQ GGTGGTGGAA 

FLGAAGTGGQ AALSQNFIGA GGTAGAGGTG 

GTGGAGTLGA DGGAGGHGGL FGAGGTGGAG 

SGGSALNVGG TGGVGGNGGS GGSLFGFGGA 

GGAGGFGADT GGNS S S VPNA VL I GNGGNGG 

30 

<212> Type : PRT 
<21X> Length. : 83 7 

S equenceNarae : SEQ ID 159 

SequenceDescription : 

35 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 

<40O> PreSequenceString : 
40 MSFVIAVPET IAAAATDLAD LGSTIAGANA 

AYQAASAEAA AFHGRFVQAL TTGGGAYAAA 

GAPGTGANGG DAGWLIGNGG AGGS GAKGAN 

I GGAGGAGGS AMLFGAGGAG GAGGAATSLV 

STAGGAGGAG GAGGLFTTGG VGGAGGQGHT 
45 GTGGAGGDGG GGGLFGAGGD GGAGGSGLTT 

VFGGGKGGAG GAGGNAGMLF GSGGGGGTGG 

GGPAGTAAGG AGGAGGAPGL IGNGGNGGNG 

AGKS GFGGFG GLLLGADGYN APESTSPWHN 

GTGDD GGAGG WLFGNGGNGG AGAAGTNGSA 
50 GGAGGS AFL I GSGGTGGVGG AATTTGGVGG 

AGGTGGAAGL FANGGAGGAG GTGSTAGGAG 

GAGGPGGLYG AGGS GGAGGH GGMAGGGGGV 

GAGGSAGLFY GSGGAGGNGG YSLNGTGGDG 

GNGGAGGKAG LYGNGGDGGA GGDGATSGKG 
55 GGLVLGRDGQ HGLT 

<212> Type : PRT 

<211> Length : 914 

SequenceName : SEQ ID 160 
SequenceDescription : 

60 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 
65 MSLVIVTPET VAAAASDVAR IGSSIGVANS AAAGSTTSVL AAGADEVSAA IATLFGSHAR 60 
EYQAISTQVA AFHDRFAQTL SAAVGSYVSA EATNAAPLAT LEHNVLNALN APTQALLGRP 120 
L I GDGAAGAP GTGQAGGAGG ILWGNGGAGG SGAPGQVGGA GGAAGL FGTG GAGGAGGAGA 180 



AAAANTTALL AAGADEVSTA 
EAAATS PLLA PINEFFLANT 
GGAGGNGGAG GLIGNGGAGG 
TGGAGGAGGN GGL FADGGVG 
AGGDGGSGGL FGAGGTGGSG 
GAGGI GGDGG TLFGS GGAGG 
TGGAGGAPGL VGNAGNGGNG 
GGLGGQLLGR DGFNAPAS TP 
AGGWLFGNGG NGGHGATGAD 
LLIGSGGTGG SGGFGLDTGG 
GLFANGGAGG AGGFGANGGT 
GSSGGTFGGN GGSGGNAGLL 
GGTGGSSGIG SSGGTGGDGG 
NGGKAGGTPG AGGTSGLIIG 



VAALFGAHGQ 60 

GRP L I GKTGTN" 120 

AGGRAS TGTG 180 

GAGGATDAGT 240 

GTSIIMVGGN 3 00 

VCGL GFDAGG 3 60 

GASANGAGAA 420 

LHTLQQQILM 480 

GGDGGSGGAG 540 

AGGRGGDAGL 600 

GGNGLLFGAG 660 

ALGASGGAGG 720 

TAGVFGNGGD 780 

ENGLNGL 837 



AAAANTTSLL AAGADEISAA 
EAAAVTPLLN SINAPVLAAT 
GGAGGPGGAA GLFGNGGAGG 
GGIGGTGGTG GNAGMLAGAA 
GGAGGAGGAG GLFGAGGMGG 
GGAAGNGGNA GTLSLGAAGG 
FGFAAGGQGG VGGSAGMLSG 
GESGGTGGVG GAGGNAVLIG 
LQQDILSFIN EPTEALTGRP 
GGAGGAGGIL FGTGGAGGAG 
AGGNAGLL I G AAGLGGCGGG 
GAGGAGGLYA HGGTGGPGGN 
GGNAGSLTLN AS GGAGGS GG 
GTGGAGQITG LRSGFGGAGG 
GAGGNA WI G NGGNGGNAGK 



IAALFGAHGR 60 

GRPLIGNGAN 120 

AGGTATANNG 180 

GAGGAGGFSF 240 

AGGFGDHGTL 3 00 

AGGTGGAGGT 3 60 

SGGS GGAGGS 420 

NGGEGGI GAL 480 

LIGNGDSGTP 540 

GVGTAGAGGA 600 

AFTAGVTTGG 660 

GGS TGAGGTG 72 0 

SSLSGKAGAG 780 

AGGAS DTGAG 840 

AGGTAGAGGA 900 
914 
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AGGAGGSGGW LLGNGGVGGA GGQSLLGGAT GGAGGNAGLF GVGGTGGPGG PGGPGGVGGT 240 

GGAGGLGGTL YGAGGHGGAG GPGPIGGVGG HGGVGGAAGL LGVGGHGGAG GHGAEGVAGA 3 00 

AGEDLSPHGT SGGVGGDAGD GGTGGRGGWL AGAGGAGGAG GVGGTGGAGG AGFSRALIVA 3 60 

GDNGGDGGNG GMGGAGGAGG PGGAGGLISL LGGQGAGGAG GTGGAGGVGG DRGAGGPGNQ 420 

5 AFNAGAGGAG GHGGDPGAGG AGGTGGAGSI TGAQGAIGAT PTSGGNGGAG GNGANATTAG 480 

TNGANGGPGG HGGLVGNGGA GGNGANGAAG TNASDSGAVG GKGNS GGNGG QGGAGGDGGT 540 

LAGNGGAGGT GGRGAD GG-XjG GSGAEGANAT TAGERGQDGG KGGNGGVGGT GGNAVAPGAN 60 0 

GGHGGNGGNP GFSGAGGLGG LSGDGVTRAA QGATPD FAD T GGKGGNGGNG ANAVAPGGTG 660 

ASGGAGGNAG AGGKGGENII GDGGGGNGGA GGKGGAGTLL GLTVFGDNGG AGVLGDSTDP 720 

10 DGSGGAGGAG GAGGAGGD 3? T I 741 



<212> Type : PRT 

<211> Length : 741 

SequenceName : SEQ ID 161 
SequenceDe script ion : 

15 

Sequence 



<213> OrganismName r Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceS taring : 

20 MS FVTAAPEM LATAAQNV1AN IGTSLSAANA TAAASTTSVL AAGADEVSQA IARLFSDYAT 60 

HYQSLNAQAA AFHHSFVQTL NAAGGAYS S A EAANASAQAL EQNLLAVINA PAQALFGRPL 120 

IGNGANGTAA SPNGGDGGIL YGNGGNGFSQ TTAGVAGGAG GSAGLIGNGG NGGAGGAGAA 180 

GGAGGAGGWL LGNGGAGGPG GPTDVPAGTG GAGGAGGDAP L I GWGGNGGP GGFAAFGNGG 240 

AGGNGGASGS L FGVGGAGGV GGSSEDVGGT GGAGGAGRGL FLGLGGDGGA GGTSNNNGGD 3 00 

25 GGAGGTAGGR LFSLGGDGGN GGAGTA I GSN AGDGGAGGDS SALIGYAQGG SGGLGGFGES 3 60 

TGGDGGLGGA GAVL I GTGVG GFGGLGGGSN GTGGAGGAGG TGATLIGLGA GGGGGI GGFA 42 0 

VNVGNGVGGL GGQGGQGAAL IGLGAGGAGG AGGATWGLG GNGGDGGDGG GLFSIGVGGD 480 

GGNAGNGAMP ANGGCTGGKTAG VIANGSFAPS FVGFGGNGGN GVNGGTGGSG GILFGANGAN 540 

GPS 543 

30 <212> Type r PRT 

<211> Length: : 543 

SequenceName r SEQ ID 162 
SequenceDesczription : 

35 Sequence 



<213> OrganismName z Mycobacterium tuberculosis H37Rv 
<400> PreSequenceS tiring : 

MSYVIiATPEM VAAAANNLAQ IGSTLSAANA AALAPTTGVL AAGADEVSAA VASLFSGHAQ 60 
40 AYQTLGTQAA AFHERFIQAIi STAAGAYGSA EAANASPLQQ ALNVINAPTQ TLLGRPLIGN 12 0 

GTNGAPGTGQ AGGPGGLL»YG NGGNGGSGGV GQAGGAGGSA GLIGI GGTGG AGGAGAVGGV 180 
GGNGGWLYGN GGAGGLGGTG VAGVNGGMGA AGGAGGNAYL FGSGGAGGQG GMGAAGADGV 240 
NPTPTGTADA GS TGTDQTIjG GNAIGGNGGP GDAGDAMTSG GAGGSGGNAV STVNGDAVGG 3 00 

EGGKGGEGAY GGAGGAGGSA AS IGNAAIGG NGGAGGNAQA PGGVGGAGGE GGDAQVGTNS 3 60 

45 PSNAEAGNGG SGGNGFDSFA SGGTGGAGGT GGAGGRGGLL IGDGGAGGAG GVGGTGGSGA 42 0 

PGGGGGAGGD GGAANTDSAG SSRKAFGGDG GVGGDGASAL GTGGEGGIGG QGGNGGAGGL 480 
L I GNGGAGGV GGTAGAGG-TG GSGGAGGAGG AGGGGTNSGP GAAFGGNGNT GGNGGNGGAP 540 
GALGGKGGSG GL I GRAGS X) G GVGAGGAGGA GGAGGTGGEG GTGGDGKTTD GNPGMGGSPG 600 
SAGQPG 606 
50 <212> Type : PRT 

<211> Length : 606 

SequenceName : SEQ ID 163 
SequenceDe script ion : 



55 Sequence 



<213> OrganismName = Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceS tiring : 

MSFLFAQPEM LGAAATDIxAS I GS AI S TANA AAAAATTRVL AAGADEVSAA VAALFSGHAQ 60 

60 TYQALRTQAA AFHQQIVQTL TSTAGAYASA EAANVEQQLL GAINAPTMAL LGRPL IGHGA 12 0 

DGAPGTGQAG GAGGI L YG3SFG GNGGS GATGQ AGGAGGAAGL IGHGGAGGLG GTGAS GGAGG 18 0 

AGGWLWGNGG AGGNGGVGVA GDPGGVGGAG GAGGAAGLWG SGGSGGTGGQ GGVGGGKSGD 240 

GGTGGIGGAG GGGGWLHGDG GAGGHGGQGG TGVSS GGNGG AGGTGGD GRG LSGSGGAGGR 3 00 

GGQTGVGGKV GENNFGGAGG AGGTGGL I GN GGAGGNGGQG AI S GAGGAGG NAWLIGDGGA 3 60 

65 GGNGGDIRGQ GGGAGGAGGA GGQLIGNGGT GGAGGTVTSP NGL GGAGGAG GSAGLIGHGG 42 0 

TGGAGGHSAQ GPDGNGGI GG AGGAGGNGGQ LYGTGGTGGT GGKGGDGFGV FGKGGAGGTG 480 

GRGGAAGLIG DAGTGGTGGK GGTAGED GTG GNGGTGGNGG AAVLIGNGGG GGAGGNGGAG 540 



WO 2005/076010 



54/341 



PCT/IN2005/000037 



NDGTPGNGGG GGVGGTGGTL FGQPGQPGPP GQPGPA 576 
<212> Type : PRT 
<211> Length = 576 

SequenceName : SEQ ID 164 
5 SequenceDescription : 

Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H37Rv 

10 <400> P resequences t ring : 

MWTSQMIVAP AFVDAAAKDL ATIGSAISRA NAEALVPITA LLPAGADDVS AAIAALFATH 60 
GQAYQELSAH AVAFHEQ FVQ LMSAGAAQYA SAEAANSSPL Q I VGQTALDA INSPVQTLTG 120 
RPLIGNGANG VAGTGQNGGD GGWLYGNGGN GGS GGTGQNG GNGGSAGLWG SGGNGGQGGA 180 
GANGAAGQPG KAGGSGGNGG AGGW I YGHGG HGGAGGNGGN ATAPGGASAG FDGGAGGNGG 240 

15 SGGRGGLLFG NGGNGSVGGM GGQGTNDTAG DSAGSGGLGG NGGNGAQGGW LIGNGGQGGD 3 00 

SGAGGGTDST QTGVMNGASG GS AG I AGNGG DAGLVGNGGA GGNGGNGAAG SALGTTIFGG 3 60 

SGGVGGSGGD GGISTGGWLFGS GAS GGNGGQG GDAGTNGFAG FGGSAGGGGW VGAVNFGPIS 420 
VQGFGLFGHG GDGGNGGDVG AGSLSIQFGA SGGDGGQGGV LYGNGGNGGN AGS GGGTGF E 480 
GSAGQGGAAI LIGNGGAGGN GATGGTGVGN IIQEAGGDGS DGGAGGS GGL LFGSGGAGGI 540 

20 GGAGGVGGSG NDGGNGGDGG QGGASGLGIG NGGPGGS GGT GGAGGTGGSA GTGGAGGDGG 6 00 

NAALLIGTGG D GGDGVP PAP GGQGGKGGL I GLPGQNGQP 63 9 

<212> Type : PRT 
<211> Length : 639 

SequenceName : SEQ ID 165 

25 SequenceDescription : 

Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H37Rv 
30 <400> Pre Sequences t ring : 

MSWVMVSPEL WAAAADLAG IGSAISSANA 

AYQAASAQAA AFYAQFVQAL SAGGGAYAAA 

GAP GTGANGG PGGWLIGNGG AGGS GAPGAG 

GGAGGNAGML FGAAGVGGVG GFSNGGATGG 
35 GAGGNAGTLA TGDGGAGGTG GASRSGGFGG 

AAGGAGGAPG LIGNGGNGGN GGAS TGGGDG 

GIGGTGGVLL GLDGF TAPAS TSPLHTLQQD 

DGGAGGWLFG NGGNGGQGTI GGVNGGAGGA 

GGAALLFGSG GAGGSGGAGA VGGNGGAGGN 
40 LFANGGAGGP GGFGSPAGAG GIGGAGGNGG 

GAGGTGGAGS HSTAAGVSGG AGGAGGDAGL 

GLLFGSGGAG GSGGFSNSGN GGAGGAGGDA 

SGAFGLGGDG GAGGATGLSG AFHI GGKGGV 

PS GAGGAGGL LLGENGLNGL M 
45 <212> Type : PRT 

<211> Length = 801 

SequenceName : SEQ ID 166 
SequenceDescription : 

50 Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

GQSYQAVSAQ AAAFHDRFVQ LLNAGGGSYA SAEIANAQQN LLNAVNAPTQ TLLGRPLVGD 
55 GADGASGPVG QPGGDGGILW GNGGNGGDST SPGVAGGAGG SAGLIGNGGR GGNGAPGGAG 
GNGGLGGLLL GNGGAGGVGG TGDNGVGDLG AGGGGGD GGL GGRAGL I GHG GAGGNGGDGG 
HGGSGKAGGS GGSGGFGQFG GAGGLLYGNG GAAGS GGNGG DAGTGVS SDG FAGLGGSGGR 
GGDAGLI GVG GGGGGNGGDP GLGARL FQVG SRGGDGGVGG WLYGDGGGGG DGGNGGLPFI 
GSTNAGNGGS ARLIGNGGAG GSGGSGAPGS VSSGGVGGAG NPGGS GGNGG VWYGNGGAGG 
60 AAGQGGPGMN TTSPGGPGGV GGHGGTAILF GDGGAGGAGA AGGP GTPDGA AGPGGS GGTG 
GLLFGVPGPS GP3DG 
<212> Type : PRT 
<211> Length : 434 

SequenceName : SEQ ID 167 
65 SequenceDescription : 



AAAVNTTGLL TAGADEVSTA 
EAAAVS PLL A PINAQFVAAT 
AGGNGGAGGL FGSGGAGGAS 
AGGAGGAGGL FGAGRERGSG 
AGGAGGDAGM FFGSGGSGGA 
GPGGAGGTGV L I GNGGNGGS 
VINMVNDPFQ TLTGRPLIGN 
GGAGGILFGT GGTGGSGGPG 
AGALLGAAGA GGAGGAGAVG 
LFGAGGTGGA GGGSTLAGGA 
LSLGASGGAG GSGGSSLTAA 
GLLVGSGGAG GAGASATGAA 
GGSAVLIGNG GNGGNGGNSG 



IAALFGAQGQ 60 

GRPLIGNGAN 120 

TDVAGGAGGA 180 

GS GNLTGGAG 240 

GGISKSVGDS 3 00 

GGTGATLGKA 360 

GANGTPGTGA 420 

ATGLGGIGGA 480 

GNGGAGGNGG 540 

GGAGGNGGLF 600 

GWGGIGGAG 660 

TGGDGGAGGK 720 

NAGKSGGAPG 780 
801 



60 
120 
180 
240 
300 
360 
420 
434 



Sequence 



WO 2005/076010 



55/341 



PCT/IN2005/000037 



<213> OrganismName :* Mycobacterium tuberculosis H3 7Rv 
<400> Pre Sequences tring : 

MAHFSVLPPE INSLRMYLGA GSAPMLQAAA AWDGLAAELG TAASSFSSVT 
5 PASAAMAAAA APYAGFLTTA SAQAQLAAGQ AKAVASVFEA AKAAIVPPAA 
LIRSNWLGLN APWIAAVESL YEEYWAADVA AMTGYHAGAS QAAAQLPLPA 
NLGIGNQGNA NLGGGNTGSG NIGETGNKGSS NLGGGNIGNN NIGSGNRGSD 
NIGFGNQGPI DVNLLATPGQ NNVGLGNIGN 
NNFGFGNTGN NNIGIGLTGN NQMGINLAGL 
10 FNTGANTLVP GDLNNLGVGN S GNAISTI GFGN 
AGFVNTGFDN SGNVNTGNGN SGNINTGSWN 
FFNTPTGPIiA VDVSGFFNTA SGGTVINGQT 
ISGLFNLRQL LG 
<212>, Type ; PRT 
15 . <211> Length : 552 

SequenceName : SEQ ID 168 
SequenceDescription = 



NNMGFGNTGD ANTGGGNTGN 
LNSGSGNIGI GNSGTNNIGL 
AGVLNTGFGN ASILNTGLGN 
AGNVNTGFGI ITDSGLTNSG 
SGIGNIGVPG TLFGSVRSGL 



TGLTGQAWQG 
VAANREAFLA 
GLQQFLNTLP 
NFGAGNVGTG 
GNIGGGNTGN 
FNSGSGNIGV 
AGELNTGFGN 
FGNTGTDVSG 
NTGLFNMGTA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
552 



Sequence 
20 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MS FL IAS PEA LAATATYLTG IGSAISAANA VAAAPTTEIL AAGTDEVSTA ISALFGAHAQ 60 
A YQAL S AHVA AFHDQFVHTL TAGAGSYMAA EAAAASPLQA LQLELLNAIN APTLALLGRP 120 

25 LIGDGTDAAP GSGGAGGAGG ILIGNGGTGG ASDLAGTGRG GVGGAGGAGG LFGIGGAGGG 18 0 

CGSAVAIGGD GGAGGAGGVF SGGGAGGAGD AIGGSGGAGG TGGIiLGGGGG AGGAGGAGGN 240 
GGGASNSASI GGDGGS GGAG GMLYTGAGGVG GNGGAAVAIG GDGGAGGRAG AIGNGGDGGN 3 00 

GGTSNTPGGS GGDGGNGGNA GLIGNGGNGG NAEIVISGGS VAGTGGNGGL LLGFNGTNGL 3 60 

p 361 

30 <212> Type : PRT 

<211> Length : 361 

SequenceName : SEQ ID 169 
SequenceDescription r 



35 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

AQAS PAAHGG SGGAGGNGGA GSAGNGGAGG AGGNGGAGGN GGGGDAGNAG SGGNGGKGGD 60 
40 GVGPGSTGGA GGKGGAGANG GSSKTGNARGG NAGNGGHGGA GGS GDTGGAG GAGGQ GGFGG 120 
TGGSGSGIGG GAGGNGGNGG AGGTGWLGG KGGDGGNGDH GGPATNPGSG SRGGAGGSGG 180 
NGGAGGNATG SGGKGGAGGN GGDGSFGATS GPAS I GVTGA PGGNGGKGGA GGSNPNGSGG 24 0 

DGGKGGNGGA GGNGGSIGAN SGIVGGSGGA GGAGGAGGNG SLS SGEGGKG GDGGHGGDGV 300 
GGNSSVTQGG SGGGGGAGGA GGSGFFGGKG GFGGDGGQGG PNGGGTVGTV AGGGGNGGVG 3 60 

45 GRGGDGVFAG AGGQGGLGGQ GGNGGGSTGG NGGLGGAGGG GGNAPDGGFG GNGGKGGQGG 420 
IGGGTQSATG LGGDGGDGGD GGNGGNSGAK AGGAGGKGQA GQPNSGTEPG FGGDGGLGGA 480 
GATP 484 
<212> Type : PRT 
<211> Length : 484 
50 SequenceName : SEQ ID 170 

SequenceDescription : 



Sequence 



55 <213> OrganismName : Rickettsia prowazekii 
<4 00 > PreSequenceString : 

MKKSKILRKF LATASLCGTL FTNSNATGTI IPNNGSVSLN TDAGLVGGVF NNGDIIQIVN 60 

GGREIKISAD KANAIIGGIN TLKELPDFGG VEVSQNVSIG PLNAGEDLNT NFGPLKFISN 12 0 

NVTSIITGVG TKTFSNIDFA GKNATLQINK DLNITTKIDN TVAGNNGS I T FEGSGIISNH 180 

60 IGYTNSLLGI NVGNGEAKIY APEANNITIN AKNINLTHNN SILTLCDGNI TTLKGNINNT 240 

TEIDGQGILN LAYDLGSSSI ITGDIGNIGS LDTINVLLGS ATFNSTILKA TNINLKHNTS 30 0 

TLNLDDNI IV I GNI KGNNNK DILNTFKVHGT NLDNEMI I PA PQKTHGTLNF KGNATLNGNI 360 

NNLNILKFSG GHGKTLNLQG NTKVDNLVFA DSVLDSGTIS VNGLLDTDCV TFNNS NVNGG 420 

TLIINAKNTI SAKLLNATKA KIQINANLTM NHPSAGDISD IRIADNTIYT IDAKNGNVNL 480 

65 LNNNAKIIFE GADSMLALIN TGVTADRTFT IYNNLNQSGN DEYGIVKIEA IKKVTTIANQ 540 

SGPYTIGQDN THRLKEL IVE GAGDIIIDDT IFTKLLSINS TGQITFNRTL DLGAGGNIAF 600 

GKHGTLWNG VTGSITTSEN NQGILTINSG NITGVIGTNE LGLKLVNIGA DPVTCSANVF 660 
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ASVALTNPSS VLILADGVTL TGEVTTKNNT 
GASNIDSNIY AGSTVLTDQT S ELTLNRTD W 
MGAALQEWF NGTTNIGGTA NSQNFTVAHS 
DIDFNNKAGK FILGDGAMID GSVLCNGGVA 
5 DNTKNVTIAN DIFVDNIHFT NGGILQLtGGN 
NGQNGILNAF TNLKASDDTI GTVKIINIGQ 
ANSQLILSAP VDQTIKFINN LNETGGGIIT 
KGKVTVTNDL DIQNIHQLNI NNGALFDDQS 
LNTSGMVFKH QDSILELKNS SNTNDHTITL 

10 VAYTLGTANH MLKQLTFASI DNGAIALKVG 
TATGNINGHV DFQGNAGVIN LNDDIEIDGS 
AGAGDVSLSA SGNYSITEIQ GNGNNNLTFA 
I GANAAVGD I IINAGSVNFS NTLKSGN"IVI 
NHTPINITST LGNNNAIGTI EVANNDVTIT 

15 TTAGNNIHTL EVTDFDTGND G 1 1 GD ANISTRL 
NVKLNIEGGI TYDIiGSKIKS LANVQISEDT 
KNLDIPDALI DLDVLPRSLS LFNYFTDIKA 
KFNDNAWLTQ EIKNANIIEI ASDKFMLLQK 
IVLDLANYEL KYTGNVTHNG LI/TIITYFDT 

20 HSDITNITSD TKHQIVKLET GAIYTPVPQT 
GRDDTGGRDD TRGRGNTDNG CRDNCDVGNI 
ILDYTKNNYV ASGIANQLIN HVKDFGN"TTD 
GLNEGWGLN GIEVENFLTD IAINMDJSTFTA 
NLKRLNTNNQ AIIAAGDEDN IVTGIWGMSF 

25 DNSIVIGAAY TMADSKVKHK NDKNGDRTKA 
IKNYEKRITT ITDQIAIGKF INTFYSYELL 
ENNTTFQMLS IKKNYYDKFE TILGLNSVTH 
LDGIDEPLTT IRFKPAKITY NLGGGISTKN 



KGVLSLGTGS NITGQIGTNS AALEKINIGA 720 
VNSNIITTAG NNSGKLIFTG NGGITGNIGA 780 
AANWITGLT TGALKYKDTG T 1 I AHGGLVG 840 
GTLDFIGDGN VTQNIGADNA NSISTINIQG 900 
LTTHNIDFGA NGGTLEFNGN NT YNLNAI IV 960 

IGTPQNFTIQ VNNKNLTLVS SVNSSINFGD 1020 

LDSNGNNLTI SGNNGIKLGS KGNELSSLNI 108 0 

LTSAKIKNIN IGTVAGGATY TLDAINDNFD 1140 

TSABDPGNNQ FGIIKLITDT NKLTIDNNGN 1200 

INVENVTLNI KDIELNEVNA NVLFNKNTTY 1260 

VTSTGNVNGT LNFNGSGKVT GLINNIVMLQ 132 0 

ANSHLTTDIN KTGGQDLNLV FINGGSVSGS 1380 

SDGATMQVNN MVTATDISGK NANNGTLKLN 1440 

GTLQAQNIHF SNATQAATIiT LGAASQVTNJ 1500 . 

KSIELTGNGT VTINSPHVYS " SITTANNAQG 1560 

TIRGDVYSKY LNIDAGKTIN FDRGDNNMNP 162 0 

DNLNFADDTA TANFKDAWI DAHIDNGGIL 1680 

NIKAATLIAD NANIiVLLDNV EVNTNLNVRD 1740 

ALQKGGHILV SQGSNVDMSD LDNLIIKIKA 1800 

KVIIDASEEQ NKFVKWVADA NGLVLLTDTG 1860 

SNNSSNEAGG SSSDKNYGIT DWPIFDPSP 1920 

AGKLLNDLGF MS PNRVTETL DRL SNRI NVN 1980 

KEIGNRLEEL S DANTVNGLN* KTNTLLNNKI 2 040 

YGKI KQNS KM S AS GYQSNTG GGIIGFDYNI 2100 

KSNIYSIYGL YNWLTNNFFV EAIGVYGRNK 2160 

GGYNYLISHR TTITPMFGMR YATFKNNGYK 2220 

YLSQDIIIKP ELHWFINYQC KNKLPNIDAR 228 0 

NMIEFGIRYN LSLAKKYTAH QGSLKIKVNL 2340 



30 <212> Type : PRT 

<211> Length r 2340 

SequenceName : SEQ ID 3_V1 
SequenceDes crip t ion : 

35 Sequence 



<213> OrganismName : Rickettsia prowazekii 
<400> PreSequenceString : 

MAQKPNFLKK IISAGLVTAS TATIVAGFSG VAMGAAMQYN RTTNAAATTF DGIGFDQAAG 60 

40 ANIPVAPNSV rTANANNPIT FNTPLTGHLNS LFLDTANDLA VTINEDTTLG FITNIAQQAK 120 

FFNFTVAAGK ILNITGQGIT VQEASNTINA QNALTKVHGG AAINANDLSG LGSITFAAAP 180 

SVLEFNLINP TTQEAPLTLG ANSKIVKTGGN GTLNITNGFI QVSDNTFAGI KTINIDDCQG 240 

LMFNSTPDAA NTLNLQVGGN TINFNGXDGT GKLVLVSKNG AATEFNVTGT LGGNLKGIIE 3 00 

LNTAAVAGKL ISQGGAANAV I GTDNGAGRA AGFIVSVDNG NAATISGQVY AKNMVIQSAN 3 60 

45 AGGQVTFEHI VDVGLGGTTN FKTADSKVII TENSNFGSTN FGNLDTQIW PDTKILKGNF 420 

I GD VKNNGNT AGVI TFNANG ALVSASTDPN I AVTNINAI E AEGAGWELS GIHIAELRLG 48 0 

NGGS I FKLAD GTVINGPVNQ NALMNNNALA AGS I QLDGS A IITGDIGNGG VNAALQHITL 540 

ANDASKILAL DGANIIGANV GGAIHFQANG GTIKLTNTQN NIWNFDLDI TTDKTGWDA 600 

SSLTNNQTLT INGSIGTWA NTKTLAQLNI GSSKTILNAG DVAINELVIE NNGSVQLNHN 660 

50 TYLITKTINA ANQGQIIVAA DPLNTNTTLA DGTNLGSAEN PLSTIHFATK AANADSILNV 720 

GKGVNLYANN I TTNDANVGS IjHFRSGGTSI VSGTVGGQQG HKLNNLILDN GTTVKFLGDT 780 

TFNGGTKI EG KSILQISNNY TTDHVESADN TGTLEFVNTD PITVTLNKQG AYFGVLKQVI 840 

ISGPGNIVFN EIGNVGIVHG IAANSISFEN ASLGTSLFLP SGTPLDVLTI KSTVGNGTVD 900 

NFNAPIVWS GIDSMINNGQ IIGDKKNTIA LSLGSDNSIT VNANTLYSGI RTT KNNQGTV 960 

55 TLSGGMPNNP GTIYGLGLEN GSPKLKQVTF TTD YNNLGS I IANNVTINDY VTLTTGGIAG 1020 

TDFDAKITLG SVNGNANVRF VDSTFSDPRS MIVATQANKG TVTYLGNALV SNIGSLDTPV 1080 

ASVRFTGNDS GAGLQGNIYS QNIDFGTYNL TILNSNVILG GGTTAINGEI DLLTNNLIFA 114 0 

NGTSTWGDNT SISTTLNVSS GNIGQWIAE DAQVNATTTG TTTIKIQDNA NANFSGTQAY 1200 

TLIQGGARFN GTLGAPNFAV TGSNIFVKYE LIRDSNQDYV LTRTNDVLNV VTTAVGNSAI 1260 

60 ANAPGVSQNI SRCLESTNTA AYNNMLLAKD P S D VAT F VGA IATDTSAAVT TVNLNDTQKT 1320 

QDLLSNRLGT LRYL SNAETS DVAGSATGAV SSGDEAEVSY GVWAKPFYNI AEQDKKGGIA 13 80 

GYKAKTTGW VGLDTLASDN LMIGAAIGIT KTDIKHQDYK KGDKTDINGL SFSLYGSQQL 1440 

VKNFFAQGNA IFTLNKVKSK SQRYFFESNG KMSKQIAAGN YDNMTFGGNL IFGYDYNAMP 15 0 0 

NVLVT PMAGL SYLKSSNENY KETGTTVANK RINSKFSDRV DL I VGAKVAG STVNITDIVI 1560 

65 YPEIHSFWH KVNGKIiSNSQ SMLDGQTAPF ISQPDRTAKT SYNIGLSANI KSDAKMEYGI 162 0 

GYDFNSASKY TAHQGTLKVR VNF 1643 
<212> Type : PRT 
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<211> Length : 1643 

SequenceName : SEQ ID 172 
SequenceDe script ion : 

5 Sequence 

<213> Organ ismName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MARI ILEAHD VWEDGTGYQM LWDADHNQYG ASIPEESFWF ANGTIPAGLY DPFEYKVPVN 60 

10 ADAS FS PTNF VLDGTAS AD I PAGTYDYVII NPNPGIIYIV GEGVSKGNDY WEAGKTYHF 120 

TVQRQGPGDA ASWVTGEGG NEFAPVQNLQ WSVSGQTVTL TWQAPASDKR TYVLNESFDT 18 0 

QTLPNGWTMI DAD GDGHNWL STINVYNTAT HTGDGAMFSK SWTASSGAKI DLSPDNYLVT 240 

PKFTVPENGK LSYWVSSQEP WTNEHYGVFL S TTGNEAANF TIKLLEETIiG SGKPAPMNLV 3 00 
KSEGVKAPAP- YQERTIDLSA YAGQQVYLAF RHFGCTGIFR LYLDDVAVSG EGSSNDYTYT . 3 60 

15 VYRDNWIAQ NLTATTFNQE NVAPGQYNYC VEVKYTAGVS PKVCKDVTVE GSNEFAPVQN 42 0 

LTGSAVGQKV TLKWDAPNGT PNPNPGTTTXi SESFENGIPA SWKTIDADGD GNNWTTTPPP 48 0 

GGSSFAGHNS AICVSSASYI NFEGPQNPDN YLVTPELSLP NGGTLTFWVC AQDANYASEH 540 

YAVYASSTGN DASNFANALL EEVLTAKTW TAP EAI RGTR VQGTWYQKTV QLPAGTKYVA 60 0 

FRHFGCTDFF WINLDDVEIK ANGKRADFTE TFESSTHGEA PAEWTTIDAD GDGQGWLCLS 660 

20 SGQLGWLTAH GGTNWASFS WNGMALNPDN YLISKDVTGA TKVKYYYAVN DGFPGDHYAV 72 0 

MIS KTGTNAG DFTWFEETP NGINKGGARF GLS TEANGAK PQSVWIERTV DLPAGTKYVA 780 

FRHYNCSDLN YILLDDIQFT MGGSPTPTDY TYTVYRDGTK IKEGLTETTF EEDGVATGNH 840 

EYCVEVKYTA GVSPKECVNV TVDPVQFNPV QNLTGSAVGQ KVTLKWDAPN GTPNPNPGTT 90 0 

TLSESFENGI PAS WKTI DAD GDGNNWTTTP PPGGTSFAGH NSAICVSSAS YINFEGPQNP 960 

25 DNYLVTPELS LPNGGTLTFW VCAQDANYAS EHYAVYASST GNDASNFANA LLEEVLTAKT 102 0 

WTAPEAIRG TRVQGTWYQK TVQLPAGTKY VAFRHFGCTD FFWINLDDVE IKANGKRADF 108 0 

TETFESSTHG EAPAEWTTID ADGDGQGWLC LSSGQLDWLT AHGGTNWAS FSWNGMALNP 1140 

DNYLISKDVT GATKVKYYYA VNDGFPGDHY AVMISKTGTN AGDFTWFEE TPNGINKGGA 120 0 

RFGLSTEANG AKPQSVWIER TVDLPAGTKY VAFRHYNCSD LNYILLDDIQ FTMGGSPTPT 1260 

30 DYTYTVYRDG TKIKEGLTET TFEEDGVATG NHEYCVEVKY TAGVSPKECV NVTVDPVQFN 1320 

PVQNLTGSAV GQKVTLKWDA PNGTPNPNPG TTTLSESFEN GIPASWKTID ADGDGNNWTT 13 80 

TPPPGGTSFA GHNSAICVSS ASYINFEGPQ NPDNYLVTPE LSLPNGGTLT FWVCAQDANY 1440 

AS EHYAVYAS STGNDASNFA NALLEEVLTA KTWTAPEAI RGTRVQGTWY QKTVQLPAGT 150 0 

KYVAFRHFGC TDFFWINLDD VEIKANGKRA DFTETFESST HGEAPAEWTT IDADGDGQGW 1560 

35 LCLSSGQLGW LTAHGGTNW ASFSWNGMAL NPDNYLISKD VTGATKVKYY YAVNDGFPGD 1620 

HYAVMISKTG TNAGDFTWF EETPNGINKG GARFGLSTEA NGAKPQSVWI ERTVDLPAGT 1680 

KYVAFRHYNC SDLNYILLDD IQFTMGGSPT PTDYTYTVYR DGTKI KEGDT ETTFEEDGVA 1740 

TGNHEYCVEV KYTAGVSPKK CVNVTINPTQ FNPVQNLTAE QAPNSMDAIL KWNAPASKRA 180 0 

EVLNEDFENG IPASWKTIDA DGDGNNWTTT PPPGGSSFAG HNSAICVSSA SYINFEGPQN 1860 

40 PDNYLVTPEL SLPGGGTLTF WVCAQDANYA SEHYAVYASS TGNDASNFAN ALLEEVLTAK 192 0 

TWTAPEAIR GTRVQGTWYQ KTVQL PAGTK YVAFRHFGCT DFFWINLDDV VITSGNAPSY 198 0 

TYTIYRNNTQ IASGVTETTY RDPDLATGFY TYGVKWYPN GESAIETATL NITS L ADVTA 2 040 

QKPYTLTWG KTITVTCQGE AM I YDMNGRR LAAGRNTWY TAQGGHYAVM VWDGKSYVE 210 0 

KLAVK 2105 

45 <212> Type : PRT 

<211> Length : 2105 

SequenceName : SEQ ID 173 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MKTSERILSY FFLLCAVFSL GSCEGLYAQV TFPNYSPTAA SSIAVCSGEE TLIIDFTWQ 60 

55 EDSNGIKVNV KLADGVEYW GTAWSVTQG NAVTVAETNV SNPNEPVFTV KSADGNNWE 120 

LGTIVKLTIK RRAVCTAWSN AINAAETGFV FKDKVTVTIG DHSDSKESNS YSVNYPNLTI 180 

KQPAPQWKQ IGETIVREFS ITNGSQNPTQ TVYLS IEYPD EAYLTGVGAM TLQAKLGASG 240 

TYADLTPTVT NGKVRIYTLS GSSLGPDHLL TNGEIIYLKE TFKLKTCAPV TVYRVGWGCS 3 00 

IDSQCEIKTT AAT I TMAAGA ANITGYSVTG PDYRSPTFSL CQPFELTIKF SNSGAGGSMG 360 

60 AAFNINTIGR NDYYRPRGFV LHEFIDVKVN GKPVTNFKTD GSELDLRFDG QFTEDPDGPG 420 

VGLDDVDGDG FYDDLPVGAT ITITVTVRLK CD Q FTACNNA PNDLSDRGLI LKTLYQTS CD 480 

RTSWIDPNTW FNLS STHL YLi SRESVQDASH MPTVIEKDTP FDLKIMTSYY SILSSYNNIW 540 

YANPNTRYW EIVFPQGMTM PPKSDIEWTN IKNHPIDGSL VFTPPINLPD ANITTSGNTM 600 

TIVSPSQERG FVTLHGVKYD CTNNHEMWE YKIREVFNYL HFPDCLCPVG PIMCNTAKRY 660 

65 VLGCDPPCGR GAETSVPKIE RADNSLGWTD YTMRTRQSRS NI S AYDLAKA LYMDEVNITA 720 

TSIQHGTASS LGARFVLATG VDRVETLTPL SADIKIFRDG VQIVSVDGYT TFRS IRRNNN 780 

AEQVIDWDFT SILPAGGLLD RDKVDWTRY RVTSQNAHRV DTQVGREWFF YNSTANVSPI 840 
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WDEANPLTCL ILVPEIYIMG TFWNGTDPH VISQCTPTDIi GRVANHYARR FGSGAFEYAN 90 0 

EYRPGVKIRN IYLKVPKSYT LNRVEYSNHR NHSSLGTTMP FEE INHTDVT SQGEYNIYKY 960 

QLADNEKAHF NITVKNAYGA ALKVNVSPTC ASSAVATNYD KISYYVDYID YYYYAATQPT 1020 

VPNSLDIVAD QSAGSNGIYS VSALNVYNRP ILYTNKPSXA LWQSGEVEL VGKTGEWKLR 10 8 0 

5 ISNPSSATAP YVWLALPTTS GLTIEKVTDA AGTEMAFTTY SGGKMYRLSE AGVPVGSALD 1140 

YTIHFTYSGC SPIALKAMGG WNCSAYPLSL DEYVCSSQVI DLKLKPLPAA MELTEIAVPD 120 0 

PTAAATLCST LEYIYSIQST DNANVYSPTF SIFPEEGLW TPNQVQVEYP AGS GNWAALN 12 60 

WNNSVNLLQ HPALTTIGYL KGLKEGESND NQRKILVKFY IKTECSFVSG KNFRVRADGR 132 0 

NACNQNAKGS GLAISTPPIR INGAIEPYTT SASTQLVTTT TSQSDCKAPK RVKWQTWG 13 80 

10 GETTPKAYLE ITLPLGFKYV TGSYAPDNTH PGGVNASPAG TEEVTLTANG EDKIKINVKA 1440 

GLTSGQSFAY TLEMKEDDDN VPACGNHTIE I VNVEE I EGL WCEGVQCAET LWTGANKFE 1500 

FELDKPYLDI TVISAVSTFS GGKENLTIEY KVSNTSTTQP LKPGAWTLF SDKDNNQVFS 15 60 

GGDVAVATQE LVAEITNTTP LTQIMKVKGV S S S HTGNIA/Xi TILPKDGCYC EIKSPMVTLN 1620 

HLPSNTOIGG TVGKPNEWKE BNNWTNDQVP DAAEDVEFAT EVNNPTDPNN PKSGPAKENL 1680 

15 HLDDIHQNGT AGRVIGNLIN DSDKDLVITT GNQLTINGW EDNNPNVGTI WKSSKDNPT 1740 

GTLLFANPGN NQNVGGTVEF YNQGYDCADC GMYRRSWQYF GIPVNESDFP YDHVD GNATV 18 00 

NQWVEPFNGD KWRPAPYAPD TKLQRFKGYQ ITNDVQAQPT GVYSFKGTLC VCDAFIiNLTR I860 

TSGVNYSGAN LIGNSYTGAI DIKQGIVFPP E VEQTVYLi FN TGTRD QWRKL NGS TVS GYRA 1920 

GQYLSVPKNT AGQDNLPDRI PSMHSFLVKM QNGASCTLQI LYDKLLKNTT WNGNGTQIT 1980 

20 WRSGNSGSAN MP SLVMDVLG NESADRLWIF TDGGLSFGFD NGWDGRKLTE KGL SQL YAMS 2040 

DIGNDKFQVA GVPELNNLLI GFDADKDGQY TLEFALSDHF AKGGVFLEDLi SRGVTRRWD 210 0 

GGSYSFDAKR GDSGARFRLS YDEEWVESAE VSVLVGTAGK RIVITNNSEH ACQANVYTTD 2160 

GKLL IRLDVK PGSKSMTEPL VDGVYWSLQ SPATSSNVRK VWN 2204 
<212> Type : PRT 

25 <211> Length : 2204 

SequenceName : SEQ ID 174 
SequenceDescription : 



30 



35 



40 



45 



50 



55 



60 



65 



Sequence 

<213> Organ! sraName r Porphyromonas gingival i 
<400> PreSequenceString : 

MNKFYKSKLQ SGBAAFVSMA TALTASAQIS FGGEPL SFSS 
NPEDLIAQSR WQSQRDGRPV RXGQVIPVDV DFASKASHIS 
TLYYDAFNIP EGGRLYIYTP DHEIVLGAYT NATHRRNGAF 
TLPDIKISGA GYIFDKVGGR PVTDNHYGIG EDDSDSDCEX 
IMVKGQYISM CSGNLLNNTK GDFTPLIISA GHCAS ITTNF 
CSNGTLAIFR GNSIIGASMK AFLPIKGKSD GLLLQLNDEV 
GAGIHHPAGD AMKI SILKKT PALNTWISSS GSGGTDDHFY 
NKHWGTIiTG GAGNCGGTEF YGRLNSHWNE YASDGNTSRM 
DGYKPLPSVP RLLLQS TGDQ VELNWTAVPA DQYPSSYQVE 
AIDESIIGSG I IRYEVSARF IYPSPLDGVE SYKDTDKTSA 
GGVSLSWKVP FLSQLVSRFG ESPNPVFKTF EVPYVSAAAA 
PEKAAIAAVY VMPSAPDSTF HLFLKSNTNR RLQKVTTE>SD 
HMLFAGI RMP NKYKLNRAIR YVRNPDNLFS ITGKKI SYUN 
LWNTDAPKI DMSLVQEPYA KGTNVAPFPE LVGIYVYKNG 
SDEYEIKLVY KGSGISNGVA QIENNNAWA YPSWTDRF S 
RSWNNLRNGV TFSVQGLTAG TYMLVMQTAN GPVSQKIVKQ 
<212> Type : PRT 
<211> Length : 940 

SequenceName : SEQ ID 175 

SequenceDescription : 

Sequence 



s W83 

RSAGTHSFDD 
SIGDVDVYRL 
ATEPVPGSEL 
NINCPEGADW 
GVTQS ELDKW 
PLRYRVYYNG 
FKYDQGGTEG 
DIYLDPQNNG 
YHIFRNGKEI 
DLAIGDIQTK 
QTPNPPVGW 
WQAGTWLRIN 
GVSFEGYGIP 
TFIGTQDPSV 
IKNAHMVHAA 



AMTIRLTPDF 
QFKLEGAKAI 
IMDYEVSRGG 
QAEKNGWQM 
IFTFHYEKRG 
WDSTPDIPSS 
GSSGSSLFNQ 
QTTILNGTYR 
ATTKELSYSD 
LKPDVTPLPG 
IADKFMAGTY 
LDKPFPVN1SD 
SLLGYMAIKY 
TTYSVSDGTE 
ALYSLDGKQV 



<213> OrganismName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MKNLNKFVSI ALCSSLLGGM AFAQQTELGR NPNVRLLEST QQSVTKVQFR MDNLKFTEVQ 
TPKGMAQVPT YTEGVNLSEK GMPTLPILSR SLAVSDTREM KVEWSSKFI EKKNVLIAPS 
KGMIMRNEDP KKIPYVYGKS YSQNKFFPGE IATLDDPFIL RDVRGQWNF APLQYNPVTK 
TLRIYTEITV AVSETSEQGK NILNKKGTFA GFEDTYKRMF MNYEPGRYTP VEEKQNGRMI 
VIVAKKYEGD IKDFVDWKNQ RGLRTEVKVA EDI AS P VTAN AIQQFVKQEY EKEGNDLTYV 
LLIGDHKDIP AKITPGIKSD QVYGQIVGND HYNEVFIGRF SCESKEDLKT QIDRTIHYER 
NITTEDKWLG QALCIASAEG GPSADNGESD I QHENV I A2STL LTQYGYTKII KCYDPGVTPK 
NIIDAFNGGI SLANYTGHGS ETAWGTSHFG TTHVKQLi TNS NQLPFIFDVA CVNGDFLFSM 
PCFAEALMRA QKDGKPTGTV AIIASTINQS WASPMRGQDE MNEILCEKHP NNIKRTFGGV 
TMNGMFAMVE KYKKDGEKML DTWTVFGDPS LLVRTLVPTK MQVTAPAQIN LTDASVNVSC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
940 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
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DYNGAI ATI S ANGKMFGSAV VENGTATINL TGLTNESTLT LTWGYNKET VIKTINTNGE 660 

PNPYQPVSNL TATTQGQKVT LKWDAPSTKT NATTNTARSV DGIREEiVLLS VSDAPELLRS 720 

GQAEIVLEAH DVWNDGSGYQ ILLDADHDQY GQVIPSDTHT LWPNCSVPAN LFAPFEYTVP 78 0 

ENADPSCSPT NMIMDGTASV NIPAGTYDFA IAAPQANAKI WIAGQGPTKE DDYVFEAGKK 840 

5 YHFLMKKMGS GDGTEIiTISE GGGSDYTYTV YRDGTKIKEG LTATTFEEDG VAAGNHEYCV 900 

EVKYTAGVSP KVCKDVTVEG SNEFAPVQNL TGSAVGQKVT LKWDAPNGTP NPNPNPNPNP 960 

NPGTTTLSES FENGIPASWK TIDADGDGHG WKPGNAPGIA GYNSNGCVYS ESFGLGGIGV 1020 

LTPDNYLITP ALDLPNGGKL TFWVCAQDAN YASEHYAVYA SSTGNDASNF TNALLEETIT 108 0 

AKGVRSPEAI RGRIQGTWRQ KTVDLPAGTK YVAFRHFQST DMFYIDLDEV EIKANGKRAD 1140 

10 FTETFESSTH GEAPAEWTTI DADGDGQGWL CLSSGQLDWL TAHGGTNWS SFSWNGMALN 12 00 

PDNYLISKDV TGATKVKYYY AVNDGFPGDH YAVMISKTGT NAGDFTWFE ETPNGINKGG 12 60 

ARFGL STEAD GAKPQSVWIE RTVDLPAGTK YVAFRHYNCS DLNYILLDDI QFTMGGSPTP 1320 

TDYTYTVYRD GTKIKEGLTE TTFEEDGVAT GNHEYCVEVK YTAGVS PKKC VNVTVNSTQF 13 80 

NPVKNtiKAQP DGGDWkKWE &PSAKKTEGS REVKRIGDGL FVTIEPANBV RA^JRAKVVLiA 1440 

15 ADNVWGDNTG YQFLLDADHN TFGSVIPATG PLFTGTASSD LYSANFEYL I PANADPWTT 1500 

QNIIVTGQGE WIPGGVYDY CITMPEPASG KMW I AGDGGKf QPARYDDFTF EAGKKYTFTM 1560 

RRAGMGD GTD MEVEDDS PAS YTYTVYRDGT KIKEGLTETT YRDAGMSAQS HEY CVEVKYT 1620 

AGVSPKVCVD YIPDGVADVT AQKPYTLTW GKTITVTCQG EAMIYDMNGR RLAAGRNTW 1680 

YTAQGGYYAV MVWDGKSYV EKLAIK 17 06 

20 <212> Type : PRT 

<211> Length : 1706 

SequenceName : SEQ XD 176 
SequenceDe script ion : 

25 Sequence 



<213> OrganistnName : PorphLyrpmonas gingivalis W83 
<400> PreSequenceString : 

MKRKPLFSAL VILSGFFGSV HPAS AQ KVPA PVDGERI I ME LSEADVECTI KIEAEDGYAN 60 

DIWADLNGNG KYDSGERLDS GEFRDVEFRQ TKAIVYGKMA KFL FRGS SAG DYGATFIDIS 120 

NCTGLTAFDC FANLLTELDL SKANGLTFVN CGKNQLTKLD LPANADIETL NCSKNKITSL 180 

NLSTYTKLKE LYVGDNGLTA LDLSANTLLE ELVYSNNEVT TINLSANTNIi KSLYCINNKM 240 

TGLDVAANKE LKILHCNNNQ LTALNLSANT KLTTLSFFNN ELTNIDLSDN TALEWLFCNG 3 00 

NKLTKLDV3A NANLIALQCS NNQLTALDLS KTPKLTTBNC YSNRIKDTAM RALIESLPTI 360 

TEGEGRFVPY NDDEGGEEEN VCTTEHVEMA KAKNWKVLTS WGEPFPGITA LISXEGESEY 420 

SVYAQDGILY LSGMEQGLPV QVYTVGGSMM YSSVASGSAM EIQLPRGAAY WRIGSHAIK 480 
TAMP 484 
<212> Type : PRT 
<211> Length : 484 

SequenceName r SEQ ID 177 

SequenceDescription : 

Sequence 



<213> OrganistnName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKRAITLFAV LLMGWSVNAW SFACKTANGT AIPIGGGSAN VYVNIoAPWN VGQNLWDLS 60 

TQIFCHNDYP ETITDYVTLQ RGSAYGGVX.S NFS GTVKYS G SSYPFPTTSE TPRWYNSRT 120 

DKPWPVALYL TPVSSAGGVA IKAGSLIAVL ILRQTNNYNS DDFQFWNIY ANNDVWPTG 180 

GCDVSARDVT VTLPDYPGSV PIPLTVYCAK SQNLGYYLSG TTADAGNSIF TNTASFSPAQ 240 

GVGVQLTRNG TIIPANNTVS LGAVGTSAVS LGLTANYART GGQVTAGNVQ SIIGVTFVYQ 3 00 

<212> Type : PRT 
<211> Length : 300 
55 SequenceName : SEQ ID 178 

SequenceDescription : 

Sequence 



<213> OrganistnName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MGIKQHNGNT KADRLAELKI RSPSIQLIKF GAIGLNAIIF SPLDIAADTG SQYGTNITIN 60 

DGDRITGDTA DPSGNLYGVM TPAGNTPGNI NLGNDVTVNV NDAS GYAKGI IIQGKNSSLT 120 

ANRLTVDWG QTSAIGINLI GDYTHADLGT GSTIKSNDDG IIIGHSSTLT ATQFTIENSN 180 

GIGLTINDYG TSVDLGSGSK IKTDGSTGVY IGGLNGNNAN GAARFTATDL TIDVQGYSAM 240 

GINVQKNSW DLGTNSTIKT NGDNAHGLWS FGQVSANALT VDVTGAAANG VEVRGGTTTI 300 

GADSHISSAQ GGGLVTSSSD ATINFSGTAA QRNSIFSGGS YGASAQTATA VINMQNTDIT 3 60 
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VDRNGSLALG LWALSGGRIT GDSLAITGAA GARGIYAMTN SQIDLTSDLV IDMSTPDQMA 42 0 

IATQHDDGYA ASRINASGRM LINGSVL.SKG GLINLDMHPG SVWTGSSLSD NVNGGKLDVA 480 

MNNSVWNVTS NSNLDTLALS HSTVDFASHG STAGTFTTLN VENLSGNSTF IMRADWGEG 540 

NGVNNRGDLL NISGSSAGNH VLAIRNQGSE ATTGNEVLTV VKTTDGAASF SASSQVELGG 60 0 

5 YLYDVRKNGT NWELYASGTV PEPTPNPEPT PAPAQPPIVN PDPTPEPAPT PKPTTTADAG 660 

GNYLNVGYLL NYVENRTLMQ RMGDLRJSTQS K DGNIWLRSYG GSLDSFASGK LSGFDMGYSG 72 0 

IQFGGDKRLS DVMPLYVGLY IDSTHAS PDY SGGDGTARSD YMGMYASYMA QNGFYSDLVI 78 0 

KASRQKNSFH VLDSQNNGVN ANGTANGMSI SLEAGQRFNL SPTGYGFYIE PQTQLTYSHQ 840 

NEMAMKASNG LNIHLNHYES LLGRASMILG YDITAGNSQL NVYVKTGAIR EFSGDTEYLL 900 

10 NDSREKYSFK GNGWNNGVGV SAQYWKQHTF YLEADYTQGN LFDQKQVNGG YRFSF 955 

<212> Type : PRT 
<211> Length : 955 

SequenceNarae : SEQ ID 179 
15 SequenceDescription : 

Sequence 

<213> OrganismName : Shigella, flexneri 2a str. 2457T 

20 <400> PreSequenceString : 

MS KFVKTAI A AAMVMGVFTS TATIAAGNNG TARFYGTIED SVCSIVPDDH KLEVDMGDIG 60 
AEKLKNNGTT TPKSFQIRLQ DCVFDTQETM TTTFTGTVSS ANSGNYYTIF NTDTGAAFNN 120 
VSLAIGDSLG TSYKSGMGID QKIVKDTSTN KGKAKQTLNF NAWLVGAADA PDLGNFEANT 180 
TFQITYL 187 

25 <212> Type : PRT 

<211> Length : 187 

SequenceNarae : SEQ ID 180 
SequenceDescription : 

30 Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MK1KTLAIW LSALSLSSAA ALADTTTWG GTIHFKGEW NAACAVDAGS VDQTVQLGQV 60 
35 RTASLKQAGA TSSAVGFNIQ LNDCDTXVAT KAAVAFLGTA IDATRTDVLA LQSSAAGSAT 120 

NVGVQ ILDRT GNALTLDGAT FSAQTTLNNG TNTIPFQARY YAIGEATPGA ANADATFKVQ 180 

YQ 182 

<212> Type : PRT 

<211> Length : 182 
40 SequenceName : SEQ ID 181 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MASISSLGVG SGLDLSSILD SLTAAQKATL TPISNQQSSF TAKLSAYGTL KSALTTFQTA 60 
NTALSKADLF SATSTTSSTT AFSATTAGNA IAGKYTISVT HLAQAQTLTT RTT RDDTKTA 120 
IATSDSKLTI QQGDDKDPIT IDISAANSSL SGIRDAINNA KAGVSASIIN VGNGEYRLSV 180 

50 TSNDTGLDNA MTLSVSGDDA LQSFMGYDAS ASSNGMEVSV AAQNAQLTVN NVAIENSSNT 240 
ISDALENITL NLNDVTTGNQ TLTITQDTSK VQTAIKDWVN AYNSLIDTFS SLTKYTAVDA 300 
GADSQSSSNG ALLGDSTLRT IQTQLKSMLS NTVSSSSYKT LAQIGITTDP SDGKLELDAD 360 
KLTAALKKDA SGVGALIVGD GKKTGITTTI GSNLTSWLST TGI I KAATDG VSKTLNKLTK 42 0 

DYNAASDRID AQVARYKEQF TQLDVLMTSL NSTSSYLTQQ FENNSNSK 468 

55 <212> Type : PRT 

<211> Length : 468 

SequenceName : SEQ ID 182 
SequenceDescription : 

60 Sequence 



<213> OrganismName : Shigella, flexneri 2a str. 2457T 
<400> PreSequenceString : 

MEGKADNWL ENGGRLDVLT GHTATNTRVD DGGTLDVRNG GTATTVSMGN GGVLLADSGA 60 

65 AVSGTRSDGK AFSIGGGQAD ALMLEKGSSF TLNAGDTATD TTVNGGLFTA RGGTLAGTTT 120 

LNNGAILTLS GKTVNNDTLT IREGDAX.LQG GALTGNGSVE KSGSGTLTVS NTTLTQKAVN 180 

LNEGTLTLND STVTTDVIAQ RGTALKXiTGS TVLNGAIDPT NVTLAS GATW NIPDNATVQS 240 
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WDDLSHAGQ IHFTSTRTGK FVPATLKVKN LNGQNGTISL RVRPDMAQNN ADRLVIDGGR 3 00 

ATGKTILNLV NAGNSASGLA TSGKGIQWE AINGATTEEG AFIQGNKLQA GAFNYSLNRD 3 60 

SDESWYLRSE NAYRAEVPLY ASMLTQAMDY DRILAGSRSH QTGVS GENNS VRLSIQGGHL 42 0 

GHDNNGGIAR GATPESSGSY GFVRLEGDLL RTEVAGMSVT AGVYGAAGHS SVDVKDDDGS 480 

5 RAGTVRDDAG SLGGYLNLIH NASGLWADIV AQGTRHSMKA SSDNNDFRVR GWGWLGSLET 540 

GLPFSITDNL MLEPQLQYTW QGLSLDDGQD NASYVKFGHG SAQHVRAGFR LGSHHDMNFG 60 0 

KGTS SRDTLR GSAKHSVREL PVNWWVQPSV IRTFSSRGDM SMGTAAAGSN MTFSPSQNGT 660 

SLDLQAGLEA RVRENXTLGV QASYAHSIKTG SSAEGYNSQA TLNVTF 7 06 
<212> Type : PRT 
10 <211> Length : 706 

SequenceName : SEQ ID 183 

SequenceDe script ion : 



Sequence ... 
15 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MAFSQAVSGL NAAATNLDVT GNNIANSATY" GFKSGTASFA DMFAGSKVGL GVKVAGITQD 60 
FTDGTTTNTG RGLDVAISQN GFFRLVDSNG SVFYSRNGQF KLDENRNLLN TQGLQLTGYP 12 0 

20 VTGTPPTIQQ GANPTNISIP NTLMAARTTT TASMQINLNS SDPLPTVTPF SASNADSYNK 18 0 

KGSVTVFDSQ GNAHDMSVYF VKTGDNNWQV YTQDSSDPMS IAKTATTLEF NANGTLVDGA 240 
MANNIATGAI NGAEPATFSL SFLNSMQQNTT GANNIVATTQ NGYKPGDLVS YQINDDGTW 3 00 

GNNSNEQTQL LGQIVLANFA NNEGLASEGD NVWSATQSSG VALLGTAGTG NFGTLTNGAL 360 
EASNVDLSKE LVNMIVAQRN YKSNAQTIKT QDQILNTRVN LR 402 

25 <212> Type : PRT 

<211> Length : 402 

SequenceName : SEQ ID 184: 
SequenceDescription : 

30 Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<r400> PreSequenceString : 

MKLVHMASGL AVAIALAACA DKSADIQTPA PAANTS I SAT QQPAIQQPNV- SGTVWIRQKV 60 
35 ALPPDAVLTV TLSDASLADA PSKVLAQKZW RTEGKQSPFS FVLPFNPADV QPNARILLSA 120 

AITVNDKLVF ITDTVQPVTN QGGTKADLXL. VPVQQTAVPV QASGGATTTV PSTSPTQVNP 180 

SSAVPAPTQY 190 

<212> Type : PRT 

<211> Length : 190 
40 SequenceName : SEQ ID 185 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MIIKKSGGRW QLSLLASWI SAFFLNTAYA WQQEYIVDTQ PGHSTERYTW DSDHQPDYND 60 

ILSQRIQSSQ RALGLEVNLA EETPVDVTSS MSMGWNFPLY EQVTTGPVAA LHYDGTTTSM 120 

YNEFGDSTTT LTDPLWHASV SSLGWRVDSR LGDLRPWAQI SYNQQFGENI WKAQSGLSRM 180 

50 TATNQNGNWL DVTVGADMLL NQNIAAYAAL TQAENTTNNS DYLYTMGVSA RF 232 



<212> Type : PRT 
<211> Length : 232 

SequenceName : SEQ ID 18 S 
55 SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
60 <400> PreSequenceString : 

MKWCKRGYVL AAML ALA SAT IQAADVTITV NGKWAKPCT VSTTNATVDL GDLYSFSLMS 60 
AGAASAWHDV ALELTNCPVG TSRVTASFSG AADSTGYYKN QGTAQNIQLE LQDDSGNTLN 120 
TGATKTVQVD DSSQSAHFPL QVRALTVNGG ATQGTIQAVI SITYTYS 167 
<212> Type : PRT 
65 <211> Length : 167 

SequenceName : SEQ ID 187 
SequenceDescription : 
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Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
5 <400> PreSequenceString : 

MKRAPLITGL LLISTSCAYA SSGGCGADST SGATNYSSW DDVTVNQTDN VTGREFTSAT 60 
LSSTNWQYAC SCSAGKAVKL VYMVSPVLTT TGHQTGYYKL NDSLDIKTTL QANDIPGLTT 120 
DQWSVNTRF TQIKSSTVYS AATQTGVCQG DTSRYGPVNI GANTTFTLYV TKPFLGSMTI 180 
PKTD I AVI KG AWVDGMGSPS TGDFHDLVKL SIQGNLTAPQ SCKINQGDVI KVNFGF INGQ 240 

10 KFTTRNAMPD GFTPVDFDIT YDCGDTSKIK NSLQMRIDGT TGWDQYNLV ARRRS SDNVP 300 
DVGIRIENLG GGVANIPFQN GILPVDPSGH GTVNMRAWPV NLVGGELETG KFQGTAT I TV 360 
MVR 3 63 

<212> Type : PRT 
<211> Length : 363 

15 SequenceName : SEQ ID 18 8 

SequenceDe script ion : 

Sequence 



20 <213> OrganismName : Siiigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MQKNAAHTYA ISSLLVLSLT GCAWIPSTPL VQGATSAQPV PGPTPVANGS IFQSAQPINY 60 

GYQPLFEDRR PRNI GDTLT I VLQENVSASK SSSANASRDG KTNFGFDTVP RYLQGLFGNA 120 

RADVEAS GGN TFNGKGGANA SNTFSGTLTV TVDQVLVNGN LHWGEKQ I A INQGTEFIRF 180 

25 SGWNPRTIS GSNTVPSTQV ADARIEYVGN GYINEAQNMG VJLQRFFLNLS PM 232 



<212> Type : PRT 
<211> Length : 232 

SequenceName : SEQ ID 189 
30 SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 

35 <400> PreSequenceString : 

MKRHLNTCYR LVWNHITGAF WAS ELARAQ GKRGGVAVAL SLAAVTSLPV LAADIWHPG 60 

ETVNGGTLVN HDNQFVSGTA DGVTVSTGLE LGPDSDENTG GQWIKAGGTG RNTTVTANGR 12 0 

QIVQAGGTAS DTVIRDGGGQ S LNGLAVNT T LDNRGEQWVH GGGKAAGT 1 1 NQDGYQTIKH 180 

GGLATGTIVN TGAEGGPESE MVS S GQMVGG TAESTTINKN GRQVIWSSGM ARDTLIYAGG 240 

40 DQTVHGEAHN TRLEGGNQYV HNGGTATETL INRDGWQVIK EGGTAAHTT I NQKESCR 297' 



<212> Type : PRT 
<211> Length : 297 

SequenceName : SEQ ID 190 
45 SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
50 <400> PreSequenceString : 

MMMKTIKHLL CCAIAASALI STGVHAASWK DALSSAASEL GNQNSTTQEG GWSLASLTNL 60 
LSSGNQALSA DNMNNAAGIL QYCAKQKLAS VTDAENI KNQ VLEKLGLNSE EQKEDTNYLD 12 0 

GIQGLLKTKD GQQLNLDNIG TTPLAEKVKT KACDLVLKQG LNFIS 165 
<212> Type : PRT 
55 <211> Length : 165 

SequenceNarae : SEQ ID 191 
SequenceDescription : 

Sequence 
60 

<213> OrganismName : Siiigella flexneri 2a str. 2457T 
<400> PreSequenceString- : 

MFKGQKTLAA LAVS LLF TAP VYAADEGSGE IHFKGEVIEA PCEIHQDDID KEVELGQVTT 60 
SHINQSHHSD AVAVDLL LVN CDLENSSNGS GGKI SKVAVT FDS SAKTTGA DPILNNTSTG 120 
65 EATGVGVRLM NKDQSNIVLG TATPDIDLAP TSSEQTLNFF AWMEQIDQAT PVTPGAVTAN 180 
ATYVLDYK 188 
<212> Type : PRT 
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<211> Length : 188 

SequenceName : SEQ ID 192 
SequenceDe script ion : 

Sequence 



<213> Organi sraName : Shigella flexneri 2a str. 2457T 
<40 0> PreSequenceString : 

MSAGSPKFTV RRIAALSLVS LWLAGCSDTS NPPAPVSSVN GNAPANTNSG MLITPPPKMG 60 
TTSTAQQPQI QPVQQPQIQA TQQPQIQPVQ PVAQQPVQME NGRIVYNRQY GNIPKGSYSG 12 0 

STYTVKKGDT LFYIAWITGN DFRDLAQRNN I QAP YALJsTVG QTLQVGNASG TPITGGNAIT 180 
QADAAEQ GW IKPAQNSTVA VASQPTITYS ESSGEQSA2STK ML PNNKPTAT TVTAPVTVPT 24 0 

ASTTEPTVSS TSTSTPISTW RWPTEGKVIE TFGASEGGNK GIDIAGSKGQ AIIATADGRV 3 00 

VYAGNALRGY GNLIIIKHND DYLSAYAHND TMLVREQQEV KAGQKIATMG STGTSSTRLH 3 60 

FEIRYKGKSV NPLRYLPQR 3 79 

<212> Type : PRT 
<211> Length : 379 

SequenceName : SEQ ID 193 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MIKFLSALIL LLVTTAAQAE RIRDLTSVQG VRQNSLIGTYG LWGLDGTGD QTTQTPFTTQ 60 
TLNNMLSQLG ITVPTGTNMQ LKNVAAVMVT ASLPPFGRQG QTIDVWSSM GNAKS LRGGT 120 
LLMTPLKGVD SQVYALAQGN I LVGGAGAS A GGSSVQWQL NGGRI TNGAV IERELPSQFG 180 
VGNTLNLQLN DEDFSMAQQI ADTINRVRGY GSATALDART IQVRVPSGNS SQVRFLADIQ 240 
NMQVNVTPQD AKWINSRTG SWMNREVTL DSCAVAQG-NL SVTVNRQANV SQPDTPFGGG 3 00 

QTWTPQTQI DLRQSGGSLQ SVRSSASLNN WRALNAL.GA TPMDLMSILQ SMQSAGCLRA 3 60 

KLEII 3 65 

<212> Type r PRT 
<211> Length : 365 

SequenceName : SEQ ID 194 

SequenceDescription : 

Sequence 



<213> OrgauismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKRSIIAAAV FSSFFMSAGV FAADVDTGTL TXKGNIAETSP CKFEAGGDSV SINMPTVPTT 60 
VFEGKAKYST YDDAVGVTSS MLKISCPKEV AGVKLS L TL TN DKITGNDKAI AS SNDTVGDN 12 0 

SDVLDVSAPF NIESYKTAEG QYAIPFKAKY LKL TDNS VQ S GDVLSSLVMR VAQD 174 

<212> Type : PRT 
<211> Length : 174 

SequenceName : SEQ ID 195 

SequenceDescription : 

Sequence 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MAVQKNVI KG ILAGTFALML SGCVTVPDAI KGSSTTPQQD LVRVMSAPQL YVGQEARFGG 60 
KWAVQNQQG KTRLEIATVP LDSGARPTLG EPS RGRI YAD VNGFLDPVDF RGQLVTWGP 120 
ITGAVDGKIG NTPYKFMVMQ VTGYKRWHLT QQVIMPPQPI DPWFYGGRGW PYGYGGWGWY 18 0 

NPGPARVQTV VTE 193 
<212> Type : PRT 
<211> Length : 193 

SequenceName : SEQ ID 196 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MRNKPFYLLC AFLWLAVSRV LAADSTITIR GYVRDNGCSV AAES TNFTVD LMENAAKQ FN 60 
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NIGATTPWP FRILLSPCGN AVSAVKVGFT GVADSHNANL LALENTVSAA AGLGIQLLNE 120 
QQNQIPLNAP SSAISWTTLT PGKPNTLNFY ARLMATQVPV TAGHINATAT FTLEYQ 176 

<212> Type : PRT 
<211> Length : 176 

SequenceName : SEQ ID 197 

SequenceDe script ion ; 

Sequence 

<213> OrganisraName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKLTVAALA VTTLLSGSAF AHEAGEFFMR AGSATVRPTE GAGGTLGSLG GFSVTNNTQL 60 
GLTFTYMATD NIGVELLAAT PFRHKIGTRA TGDIATVHHL PPTLMAQWYF GDASSKFRPY 120 
VGAGINYTTF FDNGFNDHGK EAGLSDLSLK DSWGAAGQVG VDYLINRDWL VNMSWYMDI 180 
DTTANYKLGG AQQHDSVRLD P WVFMF SAG Y RF 212 
<212> Type : PRT 
<211> Length : 212 

SequenceNarae : SEQ ID 198 

SequenceDescription : 

Sequence 



<213> OrganisraName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MFFKRGKILS AGRLNKKSLG IVMFLSVGLL LAGCSGSKSS DTGTYSGSVY TVKRGDTLYR 60 
ISRTTGTSVK ELARLNGISP PYTIEVGQKL KLGGAKSSSS TRKSTAKSTT KTASVTPSSA 12 0 

VPKSSWPPVG QRCWLWPTTG KVIMPYSTAD GGNKGIDISA PRGTP I YAAG AGKWYVGNQ 180 
LRGYGNLIMI KHSEDYITAY AHNDTMLVNN GQSVKAGQKI ATMGS TDAAS VRLHFQIRYR 240 
ATAI DPLRYL PPQGSKPKC 2 59 

<212> Type : PRT 
<211> Length : 259 

SequenceName : SEQ ID 199 

SequenceDescription : 

Sequence 



<213> OrganisraName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MAQVINTNSL SLITQNNINK NQSALSSSIE RLSSGLRINS AKDDAAGQAI ANRFTSNIKG 60 
LTQAARNAND GISVAQTTEG ALSEINNNLQ RIRELTVQAS TGTNSDSDLD SIQDEIKSRL 120 
DEIDRVSGQT QFNGVNVLAK DGSMKIQVGA NDGQTITIDL KKIDSDTLGL NGFNVNGGGA 180 
VANTAAS KAD LVAANATWG NKYTVSAGYD AAKASDLLAG VSDGDTVQAT INNGFGTAAS 240 
ATNYKYD S AS KSYSFDTTTA SAADVQKYLT PGV GDTAKGT ITIDGSAQDV QISSDGKITA 3 00 

SNGDKLYIDT TGRLTKNGSG ASLTEASLST LAANNTKATT IDIGGTSISF TGNSTTPDTI 3 60 

TYSVTGAKVD QAAFDKAVST SGNNVDFTTA GYSVNGTTGA VTKGVDSVYV DNNEALTTSD 42 0 

TVDFYLQDDG SVTNGSGKAV YKDADGKLTT DAETKAATTA DPLKALDEAI SSIDKFRSSL 48 0 

GAVQNRLDSA VTNLNNTTTN LSEAQSRIQD ADYATEVSNM SKAQIIQQAG NSVLAKANQV 540 
PQQVLSLLQG 55 0 

<212> Type : PRT 
<211> Length : 550 

SequenceName : SEQ ID 20 0 

SequenceDescription : 

Sequence 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKI ACL SAL AAVLAFTAGT SVAATSTVTG GYAQSDAQGQ MNKMGGFNLK YRYEEDNS PL 60 
GVIGSFTYTE KSRTASSGDY NKNQYYGITA GPAYRINDWA SIYGWGVGY GKFQTTEYPT 120 
YKHDTSDYGF SYGAGLQFNP MENVALDFSY EQSRIRSVDV GTWIAGVGYR F 171 

<212> Type : PRT 
<211> Length : 171 

SequenceName : SEQ ID 201 

SequenceDescription : 
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Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 
5 MKRNIIGGAF TLASLMLAGH ALAEDGWNF VGEIVDTTCE VTSDTADQIV PLGKVSKNAF 60 
SGVGSLASPQ KFSIKLENCP ATYTQAAVRF DGTEAPGGDG DLKVGTPLTA GNPGDFTGTG 120 
QA I AATGVGI RIFNQSDNSQ VKLYNDSAYT AIDAEGKAEM KFIARYVATN ATVTAGTANA 18 0 

DSQFTVEYKK 19° 
<212> Type : PRT 
10 <211> Length : 190 

SequenceName : SEQ ID 202 

SequenceDescription : 



- - Sequence 
15 

<213> OrganisraName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MKKSTLALW MGIVASASVQ AAEIYNKDGN KLDVYGKVKA MHYMSDNASK DGDQSYIRFG 60 
FKGETQINDQ LTGYGRWEAE FAGNKAESDT AQQKTRLAFA GLKYKDLGSF DYGRNLGALY 120 
20 DVEAWTDMFP EFGGDSSAQT DNFMTKRASG LATYRNTDFF GVZE DGLNLTL QYQGKNENRD 180 
VKKQNGDGFG TSLTYDFGGS DFAISGAYTN SDRTNEQNLQ SRGTGKRAEA WATGLKYDAN 240 
NIYLATFYSE TRKMTPITGG FANKTQNFEA VAQYQFDFGL RPSLGYVIiSK GKDIEGIGDE 3 00 

DLVNYIDVGA TYYFNKNMSA FVDYKINQLD SDNKLNINND DTVAVGMTYQ F 3 51 



25 <212> Type : PRT 

<211> Length : 351 

SequenceName : SEQ ID 2 03 
SequenceDescription : 

30 Sequence 



45 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MRKQWLGICI AAGMLAACTS DDGQQQTVSV PQPAVCNGPI VEISGADPRF EPLNATANQD 60 

35 YQRDGKSYKI VQDPSRFSQA GLAAIYDAEP GSNLTASGEA FDPTKLTAAH PTLPIPSYAR 12 0 

ITNLANGRMI WRINDRGPY GNDRVISLSR AAADRLNTSN NTKVRIDPII VAQDGSLSGP 180 

GMACTTVAKQ TYALPAPPDL SGGAGTSSVS GPQGDILPVS NSTLKSEDPT GAPVTSSGFL 240 

GAPTTLAPGV LEGSEPTPAP QPWTASSTT PATSPAMVTP QAASQSASGN FMVQVGAVSD 30 0 

QARAQQYQQQ LGQKFGVPGR VTQNGAVWRI QLGPFASKAE AS TLQQRLQT EAQLQSFITT 3 60 

40 AQ 3 62 

<212> Type : PRT 
<211> Length : 362 

SequenceName : SEQ ID 204 
SequenceDescription : 



Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 
50 MKKKTIYQCV ILFFSLLNIH VGMAGP EQVS MHIYGNWDQ GCDVATKSAL QNIHIGDFNI 60 
SDFQAANTVS TAADLNI D I T GCAAGITGAD VLFSGEADTL A3? TLLKLTDT GGSGGMATGI 120 
AVQILDAQSQ QEIPLNQVQP LTPLKAGDNT LKYQLRYKST KA.GATGGNAT AVLYFDLVYQ 18 0 

<212> Type : PRT 
55 <211> Length : 180 

SequenceName : SEQ ID 205 
SequenceDescription : 



Sequence 
60 

<213> OrganismName : Shigella flexneri 2a str, 2457T 
<400> PreSequenceString : 

MKNKLLFMML TILGAPGIAA AAGYDLANSE YNFAVNELSK SSFNQAAIIG QAGTNNSAQL 60 
RQGGSKLLAV VAQEGS SNRA KIDQTGDYNL AYIDQAGSAN DASISQGAYG NTAMIIQKGS 12 0 

65 GNKANITQYG TQKTAVWQR QSQMAIRVTQ R 151 
<212> Type : PRT 
<211> Length : 151 
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SequenceName : SEQ ID 206 
SequenceDescription : 

Sequence 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MMKFKKCLLP VAMLASFTLA GCQSNADDHA ADVYQTDQLN TKQETKTVNI ISILPAKVAV 60 
DNS QNKRNAQ AFGALIGAVA GGVIGHNVGS GSNSGTTAGA VGGGAVGAAA GSMWDKTLV 120 
EGVSLTYKEG TKVYTSTQEG KECQFTTGLA WITTTYNET RIQPNTKCPE KS 172 

<212> Type : PRT 
<211> Length : 172 

SequenceName : SEQ ID 207 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str . 2457T 
<400> PreSequenceString : 

MQTKKNE I WV GIFLLAALLA ALFVCLKAAN VTSIRTESTY TLYATFDNIG GLKARSPVSI 60 
GGVWGRVAD I TLDPKT YLP RVTLEI EQRY NHIPDTSSLS IRTSGLLGEQ YLALNVGFED 120 
PELGTAILKD GDTI QDTKSA MVLEDLIGQF LYGSKGDDNK NSGDAPAAAP GNNETTEPVG 180 
TTK 183 
<212> Type : PRT 
<211> Length : 183 

SequenceName : SEQ ID 208 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str . 245 7T 
<400> PreSequenceString z 

MAPLAFSAQS LAESLTVEQR LELLEKALRE TQSELKKYKD EEKKKYTPAT VNRSVSTNDQ 60 
GYAANPFPTS SAAKPDAVLV KNEEKNASET GSIYSSMTLK DFSKFVKDEI GFSYNGYYRS 120 
GWGTASHGSP KSWAIGSLGR FGNEYSGWFD LQLKQRVYNE NGKRVDAWM IDGNVGQQYS 180 
TGWFGDNAGG ENFMQFSDMY VTTKGFLPFA PEADFWVGKH GAPKIEIQML DWKTQRTDAA 240 
AGVGLENWKV GPGKIDIALV REDVDDYDRS LQNKQQINTH TIDLRYKDIP LWDKATLMVS 3 00 

GRYVTANESA SEKDNQDNNG YYDWKDTWMF GTSLTQKFDK GGFNEFSFLV ANNS I ARNFG 3 60 

RYAGAS PFTT FNGRYYGDHT GGTAVRLTSQ GEAYIGDHF I VANAIVYSFG NNIYSYETGA 420 
HSDFESIRAV VRPAYIWDQY NQTGVELGYF TQQNKDANSN KFNESGYKTT LFHTFKVNTS 480 
MLTSRLEIRF YATYIKALEN ELDGFTFEDN KDAQFAVGAQ AEIWW 525 
<212> Type : PRT 
<211> Length : 525 

SequenceName : SEQ ID 209 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKKRILSAVL VSGVTLSSAT TLSAVKADDF DAQIASQDSK INNLTAQQQA AQAQVNTIQG 60 
QVSALQTQQA ELQAENQRLE AQSATLGQQI QTLSSKIVAR MESLKQQARS AQKSNAATSY 120 
INAIINSKSV SDAINRVSAI REWSANEKM LQQQEQDKAA VEQKQQENQA AINTVAANQE 180 
TIAQNTNALN TQQAQLEAAQ LNLQAELTTA QDQKATLVAQ KAAAEEAARQ AAAAQAAAEA ' 240 
KAAAEAKALQ EQAAQAQAAA NNNTQATDVS DQQAAAADNT QAAQTGDSTE QSAAQAVNNS 3 00 

DQESTTATEA QPSASSASTA AVAANTS SAN TYPAGQCTWG VKSLAPWVGN YWGNGGQWAA 3 60 

SAAAAGYRVG STPSAGAVAV WNDGGYGHVA YVTGVQGGQI QVQEANYAGN QSIGNYRGWF 420 
NPGSVSYIYP N 43 1 

<212> Type : PRT 
<211> Length : 431 

SequenceName : SEQ ID 210 

SequenceDescription : 

Sequence 

<213> OrganismName : Streptococcus mutans UAX59 
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<400> PreSequenceString : 

MKVKKTYGFR KSKISKTLCG AVLGTVAAVS VAGQRVFADE TTTTSDVDTK WGTQTGNPA 60 

TNLPEAQGSA SKEAEQSQNQ AGETNGSIPV EVPKTDLDQA AKDAKSAGVN WQDADVNKG 12 0 

TVKTAEEAVQ KETEIKEDYT KQAEDIKKTT DQYKSDVAAH EAEVAKIKAK NQATKEQYEK 18 0 

5 DMAAHKAEVE RINAANAASK TAYEAKLAQY QADLAAVQKT MAANQAAYQK ALAAYQAELK 240 

RVQEANAAAK AAYDTAVAAN NAKNTE I AAA NEEIRKRNAT AKAEYETKLA QYQAELKRVQ 3 00 

EANAANEADY QAKLTAYQTE LARVQKANAD AKAAYEAAVA ANNAKNAALT AENTAIKQRN 3 60 

ENAKATYEAA LKQYEADLAA VKKANAANEA DYQAKLTAYQ TELARVQKAN ADAKAAYEAA 42 0 

VAANNAANAA LTAENTAIKK RNADAKADYE AKLAKYQADL AKYQKDLADY PVKLKAYEDE 480 

10 QASIKAALAE LEKHKNEDGN LTEPSAQNLV YDLEPNANLS LTTDGKFLKA SAVDDAFSKS 540 

TSKAKYDQKI LQLDDLDITN LEQSNDVASS MELYGNFGDK AGWSTTVSNIsT SQVKWGSVLL 600 

ERGQSATATY TNLQNSYYNG KKISKIVYKY TVDPKSKFQG QKVWLGIFTD PTLGVFASAY 660 

TGQVEKNTSI FIKNEFTFYD EDGKP INFDN ALLSVASLNR ENNSIEMAKD YTGKFVKISG 720 

SSIGEKNGMI YATDTLMFRQ GQGGARWTMY TRASEPGSGW DSSDAPNSWY GAGAIRMSGP 78 0 

15 NNSVTLGAIS STLWPADPT MAI ETGKKPN IWYSLNGKIR AVNVPKVTKE KPTPPVKPTA 84 0 

PTKPTYETEK PLKPAPVAPN YEKEPTPPTR TPDQAEPNKP TP P T YE TE KP LEPAPVEPSY 90 0 

EAEPTPPTRT PDQAEPNKPT PPTYETEKPL EPAPVEPSYE AEPTPPTPTP DQPEPNKPVE 960 

PTYEVIPTPP TDPVYQDLPT PPSVPTVHFH YFKLAVQPQV NKEIRNNNDI NIDRTLVAKQ 1020 

SWKFQLKTA DLPAGRDETT SFVLVDPLPS GYQFNPEATK AAS PGFDVTY DNATNTVTFK 1080 

20 ATAATIiATFN ADLTKSVATI YPTWGQVLN DGATYKNNFT LTVNDAYGIK SNWRVTTPG 1140 

KPNDPDNPNN NYIKPTKVNK NENGWIDGK TVLAGSTNYY ELTWDLDQYK NDRSSADTIQ 12 00 

KGFYYVDDYP EEALELRQDL VKITDANGME VTGVSVDNYT NLEAAPQEIR DVLiSKAGIRP 12 60 

KGAFQIFRAD NPREFYDTYV KTGIDLKIVS PMWKKQMGQ TGGSYENQAY QIDFGNGYAS 13 2 0 

NIIINNVPKI NPKKDVTLTL DPADTNNVDG QTIPLNTVFN YRLIGGIIPA DHSEELFEYN 13 8 0 

25 FYDDYDQTGD HYTGQYKVFA KVDITFKDGS I IKS GAEL TQ YTTAEVDTAK GAITIKFKEA 1440 

FLRSVSIDSA FQAESYIQMK RIAVGTFENT YINTVNGVTY SSNTVKTTTP EDPTDPTDPQ 1500 

DPSSPRTSTV INYKPQSTAY QPSSVQETLP NTGVTNNAYM PLLGIIGLVT SFSLLGLKAK 1560 

KD 1562 
<212> Type : PRT 

30 <211> Length : 1562 

SequenceName : SEQ ID 211 
SequenceDescription : 



Sequence 
35 

<213> Organ! srnName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MLTELKAVLK KPMLWITMVG VALVPALYNI IFLSSMWDPY GKVSDLPVAV VNKDKTATYE 60 

GKKMTIGKDM TDNMVRNKSL DYHFVDSEKA QKGLEKGDYY MIITLPEDLS QNAASVLTDE 12 0 

40 PKKLTIPYQT SKGHSFVASK MS ETAAKTLK ESVSKNITSS YTKSLFKNMS TLKTGLGSAA 18 0 

NASQKIATGS KQLANGSQVM TDNLNLLSNS SQSFAQGTNT LYSGLTAYTG GVGQLSAGLN 24 0 

NLNNGLTAYT NGVGQLANGS SQLSNQSQKL LGGVAQLA2STG SASIQQLVNA SSQLNQGLIK 3 00 

LSTATGLSEE QVQQFSSLIN QLGTLNQSIQ NYSDNGTATT ANS PDLSTYL SAITTAAQAI 3 60 

VNSGNTSQQT TTNQSNALAA VQATGAYQRL SAEDQSEIAA ALANTGSSTT TTGADANAVS 420 

45 QAQAILNNVQ SIQSALSTLQ TTTANTPTSP SASLTQIKNT ANSVLPSAAT SLTTLS SGLT 4 80 

QAKTALDSQV VPVSTALANG TAQLGSTFST GANSLMTGVG QYTNAVDILN AGANT LAAKN 540 

NQLTDGTSQL VNGANQLNSN SGQLTKGTAQ LANGANQIET GAGKLAAGGE SLTAGLTTLS 60 0 

SGSGELSKAL STAKNKLSLV AVDNDNAKTL SSPVTIKHTD KDNVKTNGVG MAP YMMS AAL 660 

MVMAISTNTI FRVALSGKQA KTLREWIDQK LAVNGLIAVT GAIILYFGVH IIGLSANFEL 72 0 

50 KTLGLIILTS ITFMVLVTTL VTWHDKFGSF AAL ILLLLQL GSSAGTYPLA VTDKFFQWN 780 

PYLPMSYSVS GLRETISMAG TIGMQLLALS LFFLTFAALG LLIARRRIRS VKVA 834 

<212> Type : PRT 
<211> Length : 834 
55 SequenceName : SEQ ID 212 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Streptococcus mutans UA159 
<4 00> PreSequenceString : 

MVSQKNKSKK GQSKTFTLIS NRINLLFFLI VALFTVLLLR LAQMQLYDAK FYKSKLTEST 60 

TYTIKTSSPR GQ I YDAKGVA LVENEVKEW AFTRSNTMTA KDI KANAKKL ADMVTLTESK 120 

VTKRQKKDYY LADPKNYQKI VKKLPNNKKY DNFGNNLTES K I YANAVKAV PNSAIDYSED 180 

65 EKKIIHIFSQ MNATSVFKfTA SLTTGDLTAE QIAVLATSKS DLKGISVKTD WERKTDKNSI 24 0 

TSIIGKVSSQ KTGLPAEEAN NYVKKGYSLN DRVGTS YIiEK QYENDLQGSR TVQAIKVNKE 3 00 

GKIISDKTTA KGTKGKNLKL TLDLEFQKGV EQILNQYFNS ELASGNTKYS EGVYAWLNP 3 60 
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NTGAVLSMAG LEHDLKTGEV SSNALGAVTE VFTPGSWKG ATLTAGWENG VXj S GNQVLND 420 

QPIQFAGSSP INSWFTNGST PLTASQSLEY SSNTYMVQLA LKLMGQDYHS GMTLSTDGYK 48 0 

EAMEKLRATY AQYGLGVSTG IDLPGESKGY TPEHYDPSNV LTE S FGQFDN Y"TAMQLAQYA 540 

AAVANGGKRI APHLVEGIYD NNKTGGLGNL VQSIDTKVLN NVSISSDDMG IIKEGFYNW 600 

NGGSYATGKT LAKGASVPIS AKTGTAEAYV TGDDGKSVYT SNLNWAYAP SSNPQIAVAV 660 

VLPHETDLHG TTSHAITRDI INLYQKMYPM NQ 692 
<212> Type : PRT 
<211> Length : 692 

SequenceName : SEQ ID 213 

SequenceDe script ion : 



Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MTVLKYGLGI LLSAIILAII IGGLLFTYYV SSTPKLSEAK LKATNSSLVY DSNNNLIADL 60 

GAEKRESISS DSIPMKLVNA VTSIEDHRFF KHRGVDIYRI IGAAWSNLLH ICS T QGGS TLD 12 0 

QQLIKLAYFS TKESDQTLKR KAQEWLSLQ MEKKYTKEEI LTFYVNKVYM GNGNYGMRTA 18 0 

AKSYYGKDLK DLSIAQLATL AGIPQAPTQY DPYAQPKAAT SRRNTVLSQM Y'KHKKI TKRE 240 

YD AAV ATP IS DGLQELKRSS SYPKYMDNYL KQVISEVKKR TGQDIFSAGM KVYTNWADA 3 00 

QQYLWNIYNT DEYIAYPDDN FQVASTVMDV TNGKVIAQLG GRHQDTNVSF GTNQAVLTDR 3 60 

DWGSTMKPIS AYGPALESEA FTTTAQMLND SVYYYPGTTT QVYDWDHRYN C^WMTIQTAIQ 42 0 

QSRNVPAVRA IDAAGLDTAK GFLSGLGIDY PEMRYSNAIS SNTSSSEQECY GAS S EKMAAA 480 

YAAFSNGGTY YEPQYVNKIE FKDGTSETYD AKGNRAMKET TAYMMTDMLK T7VLTYGTGTE 540 

AAI PGLYQAG KTGTSNYDDN ELVEMSEKLG INPYGLGTIA PDENFVGYTP QYSMAVWTGY 600 

KNRLMPVYGD SMKIAAQVYR TMMAYLSSSG NSDWTMPDGL YRSGGYLYLN GSSGSNSRYG 660 

AAPATSSSSS SSSSSDSNNN DQNNNQTTEA SSDSSSSSSD ATTSSNP 707 
<212> Type : PRT 
<211> Length : 707 

SequenceName : SEQ ID 214 

SequenceDescription : 



Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString r 

MKSKTAKITL LSSLALAAFG ATNVFADEAS TQLNSDTVAA PTADTQASEP AATEKEQSPV 60 

VAWESHTQG NTTTTTSQVT SKELEDAKAN ANQEGLEVTE TEAQKQPSVE AADADNKAQA 120 

QTINTAVADY QKAKAEFPQK QEQYNKDFEK YQSDVKEYEA QKAAYEQYKK EVAQGLASGR 180 

VEKAQGLVFI NEPEAKLSIE GVNQYLTKEA RQKHATED I L QQYNTDNYTA SDFTQANPYD 240 

PKEDTWFKMK VGDQISVTYD NIVNSKYNDK KISKVKINYT LNS STNNEGS JVLVNLFHDPT 3 00 

KTIFIGAQTS NAGRNDKISV TMQI IFYDEN GNEIDLSGNN AIMSLSSLNH WTTKYGDHVE 360 

KVNLGDNEFV KIPGSSVDLH GNEIYSAKDN QYKANGATFN GDGAD GWDAV NTADGTPRAAT 42 0 

AYYGAGAMTY KGEPFTFTVG GNDQNLPTTI WFATNSAVAV PKDPGAKPTP PEKPELKKPT 480 

VTWHKNLWE TKTEEVPPVT PPTTPDEPTP EKPKTPEDPQ SPWAKSVSF R.TARKGEMRV 540 

RERDYQPTLP HAGAAKQNGL ATLGAISTAF AAATLIAARK KEN 583 
<212> Type : PRT 
<211> Length : 583 

SequenceName : SEQ ID 215 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MEQKIFSKRK SKIAGLCGAI LTTTWALAS GTVIEADETI EQPVAAETVS QADGDNPEQT 60 
TSVQQETAPQ QTKTSQSSDA TVDSEESATS PSDEQTVSQN DSNSSSQIDQ TIADTNRSDS 120 
DHISKTSAAT TEDQEEKVNS AKAQTAAATN NQDTRYSAKD AYGNSNFNKT LTEFGKNANV 180 
ADVTYNGVRD EYIWNDPSA PYVPNANEIA KYLKEYLTEL RNINNIAIPV E>SVDQVMQKY 240 
AQDRANEEAN EKNGLDHDTN LPIPNNLTWV AEDGHLDMDS SIQSKSQEGY TLASDKATAY 3 00 

YLALNWFSDY FNI YDDPNDG LKSFGHAVS I LSDGGTGMGL GLASGQDNEK GMWYAQLEFG 3 60 

GNDNEDNTND FSSLKNGKGE WVLYYKGSPV KFLPNTTFWY VKKGTS PDAA STPHNSDKPS 420 
FQSSKDLDPN FKADNRFQEG KEASVHQAIP ATFKSHRDEV GNKDQNSLSA QLPDTGVQKN 480 
NQLALIALGT GLILLSGLLL SKRKSLK 507 
<212> Type : PRT 
<211> Length : 507 

SequenceName ; SEQ ID 216 
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SequenceDe script ion : 
Sequence 

5 <213> OrganismName : Streptococcus mutans UAX59 

<4 00> PreSequenceString : 

MTFEKQKHFS LRKLKFGLVS VAIIAFLFAV 

TSNQVEAKTD SANKDPQEKT GSVATDAPSM 

TDLPQNSFKQ QSAHVKMTTE AEKTPSHSIN 
10 MYFAQDGKQV KGAFAQDSDG NKHYYDRDSG 

NGQSLYFNSD GSQVKGNFVE EDGSLRYYDK 

KIEWKTSLV VDSYEFGPSV SKIILEFNHK 

GHWYFDSSH YVTLELDIPY DPNDSSRNAS 

SNSSQIISSE QDAINNRFLP TTORFSERGS 
15 EVGTDINIPIi LASNVARLTE DPIQSHFTST 

IKAYVASHPD IDSRRIYLAG VSNGGGMTLD 

AALKALKGQP MWLIHTRTDK TISADSSVLP 

YNGHWSWIYF LNDQVTGTQN TDNAKNWSGL 

NGQRRR 
20 <212> Type : PRT 

<211> Length : 726 

SequenceName : SEQ ID 217 
SequenceDescription : 

25 Sequence 



<213> OrganismName : Streptococcus mutans UAX59 

<400> PreSequenceString : 

MKIFIKKHQQ SILYYSLSFL LPSFIMFLVL 
30 NILHGTDSLF YSFKAGLGFN IFALTSYYLG 

GLSAFYSLGQ IYTKISKSLV LMLSTSYALM 

EKRGIFLYFL TLTCLFIQNY YFGFMTAIFL 

LTSAFMLLPT FLDLKSHGEV LTEQISLFSS 

LLPLIFAITF FFVKSIKWQV KVAYFLLLAI 
35 FSLVIVIMAA ETLTRIKDIK LKNFYPAFTF 

VSYFIILFTF FNQLVS YKV I ISFTLIFTSF 

EIDNYVKKTK KDNLEFFRTE KQIPQTYNDG 

QGNHSTISYP NNTILMDSLF SIKYNINNQN 

NHIYKDVKFD SYPLDNQQKF VNELTDLNLT 
40 QVYYTVKCPA NSQLYISLPN LTVNNKDENV 

LIFKLSFPKN KTVSYDLPHI YALDLTAYQK 

LIYTLPYDKG WFAKQNGKAI KI S KAQNGLM 

IFLFVFYQLY YKKFNLK 

<212> Type : PRT 
45 <211> Length : 857 

SequenceName : SEQ ID 218 
SequenceDescription : 

Sequence 
50 -------- 

<213> OrganismName : Streptococcus mutans UAX59 
<400> PreSequenceString : 

MKLKHILRIG AVAFASILLL TACGSKTSKK TVTLATVGTT NPFSYEKKGK LTGYDIEVAK 60 
EVFKASDKYD VKYQKTEWTS IFSGLDSDKY QIGANNISYT KERANKYLYS NPTASNPLVL 120 

55 WPKDSDIKS YNDIAGHSTQ WQGNTTVSM LQKFNKNHEN ISIQVKLNFTSE DLAHQIRNVS 180 
DGKYDFKIFE KISAETIIKE QGLDNLKVID LPSDQKPYVY FIFAQDQKDL QKFVNKRLKK 240 
LYENGTLEKL SKKYLGGSYL PDKKDMK 2 67 

<212> Type : PRT 
<211> Length : 267 

60 SequenceName : SEQ ID 219 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Streptococcus mutans UAX59 
<40 0> PreSequenceString : 

MRFLVFL I AF FAAFYKFIET ERIDSNTVAV NPDSLILKRF XjKTNQLNGIM IVTGPDGKAQ 60 



TKTAEADETV XTEQRQTSKI 
NSANNMSQSD KQNTVNE I S S 
TFVNDGNGNW YYLGADGRNV 
EMWTNRFVND QGNWYYLNND 
NS GDLLRKT S RTINGVNYQF 
VTPAWHAGA MVTTAGVQRK 
PFIFDSAAFR KfNWVNSYTVK 
YGNFNYAAYQ PEAAIGGEKN 
GSGGQKGAYV LiVPQSSIPWS 
MGVAYPNYFA ALVPIAASYS 
FYKELLQAGA QNKWLSYYET 
SGMVATNPTY GGDAKATVNG 



NASSQKVENQ 60 

DSQQTKTDEQ 120 

TGSHTI GGKT 180 

GVPVTGSITV 240 

DNDGNARAID 3 00 

ILNSYVSNAS 3 60 

VDNLQVQADG 420 

.PLIVWLHGIG 480 

QNQTASLMAL 540 

NQLTDNQITA 6 00 

NVGKHHSGVT 660 

RTYSNVFDWL 720 
726 



FSKNIYWGSS TTILASDGFH 
SFLTPFTYFF NVKNMADAFY 
SFTSSQLELN NWLDVFILLP 
TLWFFTQVSW DIRNRMKRLS 
DIWYFDFFAK SLLGSYDTTK 
IIASFIFQPL DLFWQGMHSP 
LGVGLLATFL FKDYYNYLTQ 
EIALNTFYQX EGIQTDWNFP 
MKFNYNSISQ FSSVKNNLSA 
PHKFGFHLKQ KNNKLQLYKN 
LFKEIPIISS VGMQVLDNRV 
FITTNKHTSS YIIDESYYLF 
SIKQLKSQTV KTTTKKNKIF 
KID VS KGSGK I IMTFVPQGL 



QYVI FDALFR 60 

LFTLIKFGLI 120 

LIMLGLQRLV 18 0 

DFVLVSIFAT 240 

YGSIPTIYIG 3 00 

NMFLHRYSWA 3 60 

VNFILTTIFL 420 

SREVYEDNVK 480 

QLLNSLGYYS 540 

FYSLPLALMS 600 

TINGS KGNKA 660 

NLGNYKKTQT 720 

TTYVAKKRTS 780 

YQGILLTCLG 840 
857 
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VFSNQSKVDG SPVSIKDYFP LASLQKLITG VAIQQLIDKG KLSLNTPLSK YYPQIENSEN 120 

ITIQNLLTHT SGLADRKEVP QQVLTTQEQQ LDFSLTNYRV TYRKKWKYAN INYALLAGXI 180 

SQISGQNYAT YVRQHFLTAG KGWHFKKYIQ IKDKSKLAAL SVMDQSTTWD KLSKEVTSTF 240 

GAGDYASRPV DYWKFMMAFI NDQFVPVSEY QRSMKMTSKS YYGGLYISQK MLHANGGGFD 3 00 

5 TYSCFAYSNP KTKQ VMVIiF I TNGKYKRVKS LAAKAFKLYA DSYALRKNET SK 3 52 



<212> Type : PRT 
<211> Length : 352 

SequenceName : SEQ ID 220 
10 SequenceDescription : 

Sequence 



- <213> OrganismName : Streptococcus mutans UA1S9 

15 <400> PreSequenceString : 

MKKKIALAAL SFVSAAVLAA CSSAPGGSSD AAGNKI GDTV KIGYNLELSG DVAAYGQAEK 
NGANLAVEEI NKAGGIDGKK IKVISKDNKS DNGEASTIST NLATQSKVNA ILGPATSGAT 
AAAAPNANDA AVPLVTPSGT QDNLTYSKGK VQDYIFRTTF QDSFQGKIIA KYATDNLKAK 
KVALYYDKSS DYAQGIADAF KKAYKGKI TV EDTFQAKDQD FQAALTKFKN KDFDAIVIPG 

20 YYTETGLITK QARDMGLTQP I LGPDGFNDE KYVEGAGAAN TNNVHYVSGY S T KVALT JSTKA 
EKFLKDYKAK YGEEPNMFAA LA YD S V YM I A DAAKDAKTSK DIATNLAKLK NFKGVTGKMT 
IDKKHNPVKS AVMVGLKDGK EDTATAVEAK 
<212> Type : PRT 
<211> Length : 390 

25 SequenceName : SEQ ID 221 

SequenceDescription : 

Sequence 



30 <213> OrganismName : Streptococcus mutans UA159 

<400> PreSequenceString : 

MKKLSLLLLV CLSLLGLFAC TSKKTADKKL TWAINS 1 1 A DITKNIAGNK WLHSIVX'VG 60 

RDPHEYEPLP EDVKKTSQAD VIFYNGINLE NGGNAWFTKL VKNAHKKTDK DYFAVSDSVK 120. 

TIYLENAKEK GKEDPHAWLD LKNGIIYAKN IMKRLSEKDP KNKSYYQKNF QAYSAKLEKL 180 

35 HKVAKEKISR IPTEKKMIVT SEGCFKYFSK AYDIPSAYIW EINTEEEGTP NQ I KAL VTCKL 240 

RKSRVSALFV ESSVDDRPMK TVSKDTGIPI AAKIFTDSVA KKGQAGDSYY AMMKWNIDKI 3 00 

ANGLSQ 3 06 

<212> Type : PRT 

<211> Length : 3 06 
40 SequenceName : SEQ ID 222 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 
MFVHTKTKKK RKWQRKVFLL LLLFLLPIVS VLAFIVLFIG GGTAESHDVE ATTGGVKIiSA 60 
KQFADKTKLG ISEEEAKNAL AFADRLMSRH HFTAQATAGV LAVGFRESGF DVKAVNUSGG 120 
VAGFFQWSGW GSSVNGDRWK VASKRELTLE VEVDLMSTEL DGRYADWKK VGSATDEKQA 180 

50 AKDWSQYYEG VAVSDGQTKA DKIESWATTI CEALKS GGTN YAKVNNTGTS STAIPQGVJEN 240 
ISAFDGHAYE GSENYPQGQC TWYVYNRAKQ LGVSFSPYMG NGGQWYQVQG YHSSHTPKAH 3 00 

TALS FVNGQA GSDPTYGHVA FVEAVKDDGS ILISEMNVYG QPAMTVAYRT FDAETAKQFW 3 60 

YVEGK 3 65 

<212> Type : PRT 

55 <211> Length : 365 

SequenceName : SEQ ID 223 
SequenceDescription : 

Sequence 

60 

<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKMKRKLLSL VSVLTILLGA FWVTKIVKAD QVTNYTNTAS ITKSDGTALS NDPSKAVTSJYW 60 
EPLSFSNSIT FPDEVSIKAG DTLTIKLPEQ LQFTTALTFD VMHTNGQLAG KATTDP3STTGE 120 
65 VTVTFTDIFE KLPNDKAMTL NFNAQLNHNN ISIPGWNFN YNNVAYSSYV KDKDITPISP 180 
DVNKVGYQDK SNPGLIHWKV LINNKQGAID NLTLTDWGE DQEIVKDSLV AARLQYXiVGD 240 
DVDSLDEAAS RPYAEDFSKN VTYQTNDLGL TTGFTYTIPG SSNNAIFISY TTRLTSSQSA 3 00 



60 
120 
180 
240 
300 
360 
390 



WO 2005/076010 



71/341 



PCT/IN2005/000037 



GKDVSNTIAI SGNNINYSNQ TGYAR I E SAY GRASSRVKRQ AETTTVTETT TSSSSETTTS 3 SO 

EATTETSSTT NNNSTTTETA TSTTGASTTQ TKTTASQTNV PTTTNITTTS KQVTKQKAKF 420 
VLPSTGEQAG LLLTTVGLVI VAVAGVYFYR TRR 453 
<212> Type : PRT 
<211> Length : 453 

SequenceName : SEQ ID 224 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MTFKKLVLGL LSFVAVFTLV ACSSSNSKNL QDDIKEKKKL WAVSPDYAP FEFKALWGK 60 
DTWGADIDL AKATAKELGV KLELSSMSFD NVLSSLKTGK ADIAISGLSY TKERAQAYDF 120. 
SEAYYKTENA ILIKKSDLNK YTMISSFNNK TKVAVQKGTI EEGLAKNQLK QSNITSLTSM 18 0 

GEAWELKSG QVDAIDLEKP VAEGYVSQNS DLVLAKVALK TGEGDAKAVA LPKDSGQLVK 240 
TVNKVIKKLK KEDKYKQFIS DAVKLTGQQV D 271 
<212> Type : PRT 
<211> Length : 271 

SequenceName : SEQ ID 225 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKKHFFMTFS LLLAAVFLVA CSNLSDSGQR NWDKINKRGM LKIATAGTLY PQSYHDDHNK 60 
LTGYDVEILK EIGKRLGLKV QFTEMGVDGM LTAIKSGQID VANYSLEDGN KNISKFLRTS 12 0 

PYKYSFTSMV VRSKDDSGIH SWSDLKGKKA AGAAS TNYMK IAKKLGAKLV VYDNVTNDVY 18 0 

MKDLVNGRTD VI INDYYLQK IAVAAVKDKY AIKINQGLYA NPYSTS FTLS LKNKVLQKKI 24 0 

NKAVKDMRKD GTLTKLSKKF FQGEDVTKKH YNSYKKIDIS DVD 2 83 

<212> Type : PRT 
<211> Length t 283 

SequenceName r SEQ ID 226 

SequenceDescription r 

Sequence 



60 



<213> OrganismName t Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MKLLKKMMQV ALATFFFGLL GTSTVFADDS EGWQFVQENG RTYYKKGALK ETYWRVIDGK 
YYYFDPLSGE MWGWQYI PA PHKGVTIGPS PRIEIALRPD WFYFGQDGVL QEFVGKQVLE 12 0 

AKTATNTNKH HGEEYDSQAE KRVYYFEDQR SYHTLKTGWI YEEGYWYYLQ KDGGFDSRIN 180 
RLTVGELARG WVKDYPLTYD EEKLKAAPWY YLDPATGWQN LGNKWYYLRS SGAMATGWYQ 24 0 

EGSTWYYLNA SNGDMKTGWF QVNGNWYYAY DSGALAVNTT VGGYYLNYNG EWVK 294 

<212> Type : PRT 
<211> Length : 294 

SequenceName : SEQ ID 227 

SequenceDescription : 

Sequence 

<213> OrganismName : Streptococcus pneumoniae R6 
<4 00> PreSequenceString : 

MKLLKKMMQV LLAVFFFGLL ATNTVFANTT GGRFVD KDNR KYYVKDDHKA IYWHKIDGKT 60 
YYFGDIGEMV VGWQ YLE IPG TGYRDNLFDN QPVNEIGLQE KWYYFGQDGA LLEQTDKQVL 120 
EAKTSENTGK VYGEQYPLSA EKRTYYFDNN YAVKTGW I YE DGNWYYLNKL GNFGDDSYNP 18 0 

L P I GEVAKGW TQDFHVTIDI DRS KPAPWYY LDASGKMLTD WQKVNGKWYY FGSSGSMATG 240 
WKYVRGKWYY LDNKNGDMKT GWQYLGNKWY YLRSSGAMVT GWYQDGLTWY YLMAGNGDMK 3 00 

TGWFQVNGKW YYAYS SGALA VNTTVDGYSV NYNGEWVQ 3 38 

<212> Type : PRT 
<211> Length : 338 

SequenceName : SEQ ID 228 

SequenceDescription : 



Sequence 
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<213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MNKKKMILTS LASVAILGAG FVASQPTWR AEESPVASQS KAEKDYDAAK KDAKNAKKAV 60 

5 EDAQKALDDA KAAQKKYDED QKKTEEKAAL EKAAS EEMDK AVAAVQQAYL AYQ QATDKAA 120 

KDAADKMIDE AKKREEEAKT KFNTVRAMW PEPEQLAETK KKSEEAKQKA PELTKKLEEA 180 

KAKLEEAEKK ATEAKQKVDA EEVAPQAKIA ELENQVHRLE QELKEIDESE SEDYAKEGFR 240 

APLQSKLDAK KAKLSKLEEL SDKIDELDAE IAKLEDQLKA AEENNNVEDY FKEGLEKTIA 3 00 

AKKAELEKTE ADLKKAVNEP EKPAPAPETP APEAPAEQPK PAPAPQPAPA PKPEKPAEQP 3 60 

10 KPEKTDDQQA EEDYARRSEE EYNRLTQQQP PKAEKPAPAP KTGWKQENGM WYFYNTDGSM 420 

ATGWLQNNGS WYYLNSNGAM ATGWLQYNGS WYYLNANGAM ATGWAKVNGS WYYLNANGAM 480 

ATGWLQYNGS WYYLNANGAM ATGWAKVNGS WYYLNANGAM ATGWLQYNGS WYYLNANGAM 54 0 

ATGWAKVNGS WYYLNANGAM ATGWVKD GDT WYYLEASGAM KASQWFKVSD KWYYVNGLGA 600 

. IiAVNTTVDGY KVNANGEWV ~ - - ' -63.9 

15 <212> Type : PRT 

<211> Length : 619 

SequenceName : SEQ ID 229 
SequenceDe script ion : 

20 Sequence 



<213> OrganismNatne : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MKILPFIARG TSYYLKMSVK KLVPFLWGL MLAAGDSVYA YSRGNGSIAR GDDYPAYYKN 60 

25 GSQEIDQWRM YSRQCTSFVA FRLSNVNGFE I PAA YGNANE WGHRARREGY RVDNTPTIGS 12 0 

ITWSTAGTYG HVAWVSNVMG DQIEIEEYNY GYTESYNKRV IKANTMTGFI HFKDLDSGSV 18 0 

GNSQSSASTG GTHYFKTKSA I KTEPLVS AT VIDYYYPGEK VHYDQILEKD GYKWLSYTAY 24 0 

NGSYRYVQLE AVNKNPLGNS VLSSTGGTHY FKIKSAIKTE PLVSATVIDY YYPGEKVHYD 3 00 

QILEKDGYKW LSYTAYNGSR RYIQLEGVTS SQNYQNQSGN ISSYGSNNSS TVGWKKINGS 3 60 

30 WYHFKSNGSK STGWLKDGSS WYYLKLSGEM QTGWLKENGS WYYLGS S GAM KTGWYQVSGE 420 
WYYSYSSGAL AINTTVDGYR VNSDGERV 448 
<212> Type : PRT 
<211> Length : 448 

SequenceName : SEQ ID 23 0 
35 SequenceDescription r 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

40 <400> PreSequenceString : 

MFASKSERKV HYSIRKFSIG VASVAVASLV MGSWHATEN EGSTQAATSS NMAKTEHRKA 60 

AKQWDEYIE KMLRE IQLDR RKHTQNVALN IKLSAIKTKY LRELNVLEEK SKDELPSEIK 12 0 

AKLDAAFEKF KKDTLKPGEK VAEAKKKVEE AKKKAEDQKE EDRRNYPTNT YKTLELEIAE 180 

FDVKVKEAEL ELVKEEAKES RNEGTI KQAK EKVESKKAEA TRLENI KTDR KKAEE EAKRK 240 

45 ADAKLKEANV ATSDQGKPKG RAKRGVPGEL ATPDKKENDA KSSDSSVGEE TLPSSSLKSG 3 00 

KKVAEAEKKV EEAEKKAKDQ KEEDRRNYPT NTYKTLDLE I AE SDVKVKEA ELELVKEEAK 3 60 

EPRDEEKIKQ AKAKVESKKA EATRLENIKT DRKKAEEEAK RKAAEEDKVK EKPAEQPQPA 420 

PATQPEKPAP KPEKPAEQPK AEKTDDQQAE EDYARRSEEE YNRLTQQQPP KTEKPAQPST 480 

PKTGWKQENG MWYFYNTDGS MATGWLQNNG SWYYLNANGA MATGWL QNNG SWYYLNANGS 540 

50 MATGWLQNNG SWYYLNANGA MATGWLQYNG SWYYLNSNGA MATGWLQYNG SWYYLNANGD 600 

MATGWLQNNG SWYYLNANGD MATGWLQYNG SWYYLNANGD MATGWVKDGD TWYYLEASGA 660 

MKASQWFKVS DKWYYVNGSG ALAVNTTVDG YGVNANGEWV N 7 01 
<212> Type : PRT 
<211> Length : 701 

55 SequenceName : SEQ ID 231 

SequenceDescription : 

Secjuence 



60 <213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MKKTTILSLT TAAVI LAAYV PNEPILAAYV PNEPILADTP SSEVIKETKV GSIIQQNNIK 60 

YKVLTVEGNI GTVQVGNGVT PVEFEAGQDG KPFTIPTKIT VGDKVFTVTE VASQAFSYYP 12 0 

DETGRIVYYP SSITIPSSIK KIQKKGFHGS KAKTIIFDKG SQLEKIEDRA FDFSELEEIE 180 

65 LPASLEYIGT SAFSFSQKLK KLTFSSSSKL ELISHEAFAN LSNLEKLTLP KSVKTLGSNL 240 

FRLTTSLKHV DVEEGNE S FA SVDGVLFSKD KTQLIYYPSQ KNDESYKTPK ETKELASYSF 3 00 

NKNSYLKKLE LNEGLEKIGT FAFADAIKLE EISLPNSLET I ERLAF YGNL ELKELILPDN 360 
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VKNFGKHVMN GLPKFLTLSG NNINSLPSFF LSGVLDSLKE IHIKNKSTEF SVKKDTFAIP 42 O 

ETVKFYVTSE HIKDVLKSNL STSNDIIVEK VDMI KQETDV AKPKKNSNQG WGWVKDKGL 48 O 

WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL 54 O 

WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL 60 O 

5 WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKVSGK 66 O 

WYYTYNSGDL LVNTTTPDGY RVNANGEWVG 69 O 

<212> Type : PRT 
<211> Length : 690 

SequenceName : SEQ ID 232 
10 SequenceDe script ion : 

Sequence 



40 



60 



<213> OrganismName Streptococcus pneumoniae R6 

15 <40 0> PreSequenceString : 

MEINVSKLRT DLPQVGVQPY RQVHAHS TGN PHS TVQNEAD YHWRKDPELG FFSHIVGNGC SO 
IMQVGPVDNG AWDVGGGWNA ETYAAVELIE SHSTKEEFMT DYRLYIELLR NLADEAGLPK 12 O 

TLDTGSLAGI KTHEYCTNNQ PNNHSDHVDP YPYLAKWGIS REQFKHDIEN GLTIETGWQK 18 O 

NDTGYWYVHS DGSYPKDKFE KINGTWYYFD SSGYMLADRW RKHTDGNWYW FDNSGEMATG 24 O 

20 WKKIADKWYY FNEEGAMKTG WVKYKDTWYY LDAKEGAMVS NAFIQSADGT GWYYL KPDGT 30 O 

LADRPEFTVE PDGLITVK 3 IS 

<212> Type : PRT 
<211> Length : 318 

SequenceName : SEQ ID 233 

25 SequenceDescription : 

Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
30 <40 0> PreSequenceString : 

MTFAYWCILI AYLLPLFCAA YAKKAGGFRF KDNHNPRDFL ARTQGTAARA HAAQQNGFEA 6 0 

FAPFAAAVLT AHATGNAGQA TVNTLAGLFI LFRLAFIWCY IADKAALRSL MWVGGFVCTV 12 O 

GLFWAA 12 1 

<212> Type : PRT 
35 <211> Length : 127 

SequenceName : SEQ ID 234 
SequenceDescription : 



Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MNKIYRIIWN SALNAWVAVS ELTRNHTKRA SATVKTAVLA TLLFATVQAN ATDEDEEEEL 6 0 

ESVQRSWGS IQASMEGSGE LETISLSMTN DSKEFVDPYI WTLKAGDNL KIKQNTNENT 12 O 

45 NASSFTYSLK KDLTGL I NVE TEKLSFGANG KKVNI I SDTK GLNFAKETAG TNGDTTVHLN 18 O 

GIGSTLTDTL AGS S AS HVD A GNQSTHYTRA AS I KDVLNAG WNIKGVKTGS TTGQSENVDF 24 O 

VRTYDTVEFL SADTKTTTVN VESKDNGKRT EVKIGAKTSV I KEKDGKLVT GKGKGENGSS 30 O 

TDEGEGLVTA KEVI DAVNKA GWRMKTTTAN GQTGQADKFE TVTSGTNVTF ASGKGTTATV 36 O 

SKDDQGNITV MYDVNVGDAL NVNQLQNSGW NLDSKAVAGS SGKVISGNVS PSKGKMDETV 42 O 

50 NINAGNNIEI SRNGKNIDIA TSMAPQFSSV SLGAGADAPT LSVDDEGALN VGS KDANKPV 48 O 

RITNVAPGVK EGDVTNVAQL KGVAQNLNNR I DNVDGNARA GIAQAIATAG LVQAYLPGKS 54 O 

MMAIGGGTYR GEAGYAI GYS SISDGGNWII KGTASGNSRG HFGASASVGY QW 592 

<212> Type : PRT 
55 <211> Length : 592 

SequenceName : SEQ ID 235 
SequenceDescription : 



Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MLLAEGQKSA VTEYYLNHGT WPSNNSDAGV ASTATDIKGK YVKEVKVEKG VITATMLSSG 6 0 

VNNEIKGKKL SLWAKRQAGS VKWFCGQPVE RAANNAANDA VTAATANGNG KIDTKHLPST 12 O 

65 CRDAASAVCI ETPPTAFYKN T 141 
<212> Type : PRT 
<211> Length : 141 
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SequenceName : SEQ ID 23 6 
SequenceDescription : 

Sequence 
5 

<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA WAGHTYFGIN YQYYRDFAEN 60 

KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG VAALVGDQYI VSVAHNGGYN 12 0 

10 NVDFGAEGRN PDQHRFSYQI VKRNNYKPDN SHPYNGDYHM PRLHKFVTDA EPVEMTSDMR 180 

GNTYSDKEKY PERVRIGSGH HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGWSLSG 240 

DVRHANDYGP MP I AGAAGD S GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 300 

FYDDIYRGDT HTVFFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ TVRLFDESLN 3 60 
ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN NINQGAGGLY FEGDFTVSPE ... ■■ 42 0 

15 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTLHV QAKGENQGSI SVGDGTVILD 480 

QQADDKGKKQ AFSEIGLVSG RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN 540 

TDEGAMIVNH NATTTSTVTI TGNESITQPS GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 60 0 

LNLVYQPAAE DRTLLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG WSKMEGIPQG 660 

EIVWDNDWIN RTFKAENFHI QGGQAVISRN VAKVEGDWHL SNHAQAVFGV APHQSHTICT 72 0 

20 RSDWTGLTNC VEKTITDDKV IASLTKTDIS GNVSLADHAH LNLTGLATLN GNLSANGDTR 780 

. YTVSHNATQN GNLSLVGNAQ ATFNQATLNG NTSASGNASF NLSNNAAQNG SLTLSDNAKA 84 0 

NVSHSALNGN VSIiADKAVFH FENSRFTGQIi SGSKDTALHL KDSEWTLPSG TELGNLNLDN 900 

AT I TLNS AYR HDAAGAQTGS VSDTPRRRSR RSLLSVTPPT SVESRFNTLT VNGKLNGQGT 960 

FRFMSELFGY RSDKLKLAES S EGT YTLAVN NTGNEPVSLD QLTWEGKDN KPLSENLNFT 1020 

25 LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA 108 0 

GRDAAEKTES VAEPARQAGG ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPATTAFPR 1140 

ARRARRDLPQ PQPQPQPQPQ PQRDLISRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 1200 

NAVWTSGIRD TKHYRS QDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR TENTFDDGIG 1260 

NSARLAHGAV FGQYGIGRFD IGISTGAGFS SGSLSDGIGG KIRRRVLHYG I QARYRAGFG 132 0 

30 GFGIEPYIGA TRYFVQKADY RYENVNIATP GLAFNRYRAG IKADYSFKPA QHISITPYLS 138 0 

LSYTDAASGK VRTRVNTAVL AQD FGKTRS A E WGVNAE I KG FTLSLHAAAA KGPQLEAQHS 1440 

AGIKLGYRW 1449 
<212> Type : PRT 
<211> Length : 1449 

35 SequenceName : SEQ ID 237 

SequenceDescription : 

Sequence 



40 <213> OrganismName : Neisseria meningitidis Z2491 
<40 0> PreSequenceString : 

MNTLQKGFTL IELMIVIAIV GILAAVALPA YQDYTARAQV SEAILLAEGQ KSAVTEYYLN 60 
HGEWPSNNTS AGVASSTDIK GKYVQSVEVK NGWTATMAS SNVNNEIKGK KLSLWAKRQD 12 0 

GSVKWFCGQP VKRNDTATTN DDVKADTAAN GKQIDTKHLP S TCRDAAS AG 170 
45 <212> Type : PRT 

<211> Length : 170 

SequenceName : SEQ ID 23 8 

SequenceDescription : 

50 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MQARLLIPIL FSVFILSACG TLTGIPSHGG GKRFAVEQEL VAASARAAVK DMDLQALHGR 60 
55 KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT DYTYPRYETT AETTSGGLTG 12 0 

LTTSLSTLNA PALSRTQSDG SGSKSSLGLN IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF 180 
FLRGIDWSP ANADTDVFIN IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL 24 0 

I KPKTNAFEA AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 3 00 

SHEGYGYSDE AVRRHRQGQP 32 0 

60 <212> Type : PRT 

<211> Length : 320 

SequenceName : SEQ ID 239 
SequenceDescription : 

65 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
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<400> PreSequenceString : 

MRPIFLSFVL FPILITACST PDKSARWENI GTISNGNIHT YINKDSVRKN GNLMIFQDKK 
WTNLKQERF ANTPAYKTAI AEWE IHCNNK TYRLSSLQLF DTKNTEISTQ NYTASSLRPM 
SILSGTLTEK QYETVCGKKL 
5 <212> Type : PRT 

<211> Length : 140 

SequenceName : SEQ ID 24 0 

SequenceDe script ion : 

10 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 
. . MNKLFI TAL S ALALSACAGT WEGAKQDTAR NLDKTQAAAE RAAEQTGNAV SKGWDKTKEA 60 
15 VKKGGNAVGR GISHLGGKIE NATE 84 
<212> Type : PRT 
<211> Length : 84 

SequenceName : SEQ ID 241 
SequenceDescription : 

20 

Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 

<4 00> PreSequenceString : 
25 MKLLFIPLVL FVAVEHFYIA WLEMTQIPSE KAAETFKLPY EFMEQNRVQT LFGNQGLYNG 60 

FLGIGLVWSR FAAPDNAVYG ATVLFLGFVL I AAAWGAFS S GNKGILVKQG LPAFLAAAAV 120 

LAV 123 

<212> Type : PRT 

<211> Length : 123 
30 SequenceName : SEQ ID 242 

SequenceDescription : 

Sequence 



35 <213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 
MASSNVNNEI KDKKLSLWAK RQDGSVKWFC GQPVKRDAAT DADVTADSGN EIDTKHLPST 60 
CRDAASAVCT KTPEYYPNHG EWPKNFVIPA QAGIQVCRHG NLSGKKVSPV LSSRFPLSWE 120 

40 <212> Type r PRT 

<211> Length : 120 

SequenceName : SEQ ID 243 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 

<400> PreSequenceString : 

MLLAEGQKSA VTEYYLNHGE WPSNNTSAGV ATSTDI KGKY VQSVEVKNGV VTATMASSNV 60 
50 NNEIKGKKLS LWAKRQDGSV KWFCGQPVKR NDTATTNDDV KADTAANGKQ IDTKHLPSTA 120 

STRKSTPN * 12 8 

<212> Type : PRT 

<211> Length : 128 

SequenceName : SEQ ID 244 
55 SequenceDescription : 

Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 

60 <400> PreSequenceString : 

MPIPFKPVLA AAAIAQAFPA FAADPAPQSA QTLNEITVTG THKTQKLGEE KIRRKTLDKL 60 

LVNDEHDLVR YDPGISWEG GRAGSNGFTI RGVDKDRVAI NVDGLAQAES RSSEAFQELF 120 

GAYGNFNANR NTSEPENFSE VTITKGADSL KSGSGALGGA VNYQTKSASD YVSEDKPYHL 180 

GIKGGSVGKN SQKFSSITAA GRLFGLDALL VYTRRFGKET KNRSTEGDIE IKNDGYVYNP 240 

65 TDTGGPSKYL TYVATGVARS QPDPQEWVNK STLFKLGYNF NDQNRIGWIF EDSRTDRFTN 300 

ELSNLWTGTT TSAATGDYRH RQDVSYRRRS GVEYKNELEH GPWDSLKLRY DKQRIDMNTW 360 

TWDIPKNYDK RGINGEVYHS FRHIRQNTAQ WTADFEKQLD FSKAVWAAQY GLGGGKGDNA 420 



60 
120 
140 
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NSDYSYFAKL YDPKILASNQ AKITMLIENR SKYKFAYWNN AFHLGGNDRF RLNAGIRYDK 480 

NSSSAKDDPK YTTAIRGQIP HLGSERAHAG FSYGTGFDWR FTKHLHLLAK YSTGFRAPTS 540 

DETWLLFPHP DFYLKANPNL KAEKAKNWEL GLAGSGKAGN FKLSGFKTKY RDFIELTYMG 60 0 

VSSDDKNNPR YAPL SDGTAL VSSPWQNQN RSAAWVKGIE FNGTWNLDSI GLPKGLHTGL 660 

5 MVSYIKGKAT QNNGKETPIN ALSPWTAVYS LGYDAPSKRW GINAYATRTA AKKPSDTVHS 720 

NDDLNNPWPY AKHSKAYTLF DLSAYLNIGK QVTLRAAAYN ITNKQYYTWE SLRSIREFGT 780 

VNRVDNKTHA GIQRFTSPGR SYNFTIEAKF 810 
<212> Type : PRT 
<211> Length : 810 
10 SequenceName : SEQ ID 245 
SequenceDescription : 

Sequence 



15 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MKKSLIALTL AALPVAAMAD VTLYGTIKTG VETSRSVEHN GGQWSVETG TGIVDLGSKI 60 
GFKGQEDLGN GLKAIWQVEQ KASIAGTDSG WGNRQSFIGL KGGFGKLRVG RLNSVLKDTG 120 
DINPWDSKSD YLGVNKIAEP EARLISVRYD SPEFAGLSGS VQYALNDNVG RHMSESYHAG 180 

20 FNYKNGGFFV QYGGAYKRHQ DVDDVKIEKY QIHRLVSGYD NDALYASVAV QQQDAKLVED 240 
NSHNSQTEVA ATLAYRFGNV TPRVSYAHGF KGSVDDAKRD NTYDQVWGA EYDFSKRTSA 3 00 

LVSAGWLQEG KGENKFVATA GGVGLRHKF 329 
<212> Type : PRT 
<211> Length : 329 

25 SequenceName : SEQ ID 246 

SequenceDescription : 

Sequence 



30 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MKTLLLMPL VLTACGTLTG XPAHGGGKRF AVEQELVAAS SRAAVKEMDL SALKGRKAAL 60 
YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY PAYDTTATTK SDALSSVTTS 120 
TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT GDYRNETLLA NPRDVSFLTN LIQTVFYLRG 180 

35 IEWPPEYAD TD VFVTV DVF GTVRSRTELH LYJSTAETLKAQ TKLEYFAVDR DSRKLLIAPK 240 
TAAYESQYQE QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 300 
DVGNEVTRRR KGG 3 13 

<212> Type r PRT 
<211> Length t 313 

40 SequenceName : SEQ ID 247 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MNKTLSILPV AILLGGCAAG GGNTFGSLDG GTGMGGSIVK MAVESQCRAE LNKRSEWRLT 60 
ALAMSAEKQA EWENKICACV AQEAPNQLTG NDVMQMLDPS TRNQALAALT AKTVSACFKH 120 
LYR - 123 

50 <212> Type : PRT 

<211> Length : 123 

SequenceName : SEQ ID 248 

SequenceDescription : 

55 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MNPL I HQAKE SSMQTRILSA VLLAFSTAAF AGGAFTLQFD NPSEDGGFTQ NQILSAPYGF 60 
60 GCSGGNASPA LSWKNPPAGT KSFVLTVYDK DAPTGLGWMH WWADIPADV RRRNATSLQL 120 

SRCASIADDQ SAAISAVISL QICRIRLTPS YTAKPMPSCC NHANTPQSAA SAALCGTSSS 180 

VSTAAA 186 

<212> Type : PRT 

<211> Length : 186 
65 SequenceName : SEQ ID 249 

SequenceDescription : 
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Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

5 MNKTLKRRVF RHTALYAAIL MFSHTGGGGG AMAQTRQYAI IMNERNQPEV QWNGSYSIKD 60 

KDRKREYTHH NHQQGGSSVS FNNSDELVSR QSGTAVFGTA TYLPPYGiCVS GFDAAALKER 12 0 

NNAVDWIHTT HPGLIGYSYD GWCRSATDC PKLVYKTRFS FDNPDLAKTG GGLDKHTEPS 18 0 

RDNSPIYKLK DHPWLGVSFN LGAEGIAKNG KTINKLVSSF NEKNSNNNLV YTTEGRDISL 24 0 

GNWQRETTAM AYYLNAKLHL LDKKQIQNIT DKTVQLGVLK PSIDVRTRNT GTAGILSYWA 3 00 

10 KWDIKDTGQI PVKLSLTQVK AGRCVNKDNP NKNTKTSSPA LTAPALWFGA GQDGKAEMYS 3 60 

ASVSTYPDSS SSRIFLQNLK RKTDTSRPGR YSLATLNKSD IESREPSFTS RQTVIRLDGG 420 

VQQIKLDRNN TEVTGFNGND GKNDTFGIVS EGSFMPDASE WKKVLLPWTV RAFNYDGRFN 48 0 

TVNKE ENNGK PKYSQKYRSR NNGKHERNLG DIVNSPIVAV GEYLATSAND GMVHIFKQSG 540 

GDKRS YNLKL SYIPGTMPRK DIESKDSTLA KELRA FAE KG YVGDRYGVDG GFVLPJRI 1'DD 60 0 

15 QDKQKHFFMF GAMGLGGRGA YALDLTKADD NDPTKASLFD VKDNGNNGNN GNNRVELGYT 660 

VGTPQIGKTH NGKYAAFLAS GYATKQIDSG ENKTALYVYD LESNNGTLIR KIEVTDGKGG 72 0 

LSSPTLVDKD LDGTVD I AYA GDRGGKMYRF DLSGNNPNSW TVRTIFQGTK PITSAPAISQ 780 

LKDKRWIFG TGSDLS EDDV LSTDEQHIYG IFDNDTNTGT AQEGLGKGLL EQKLSEENKT 840 

LFLTDYKRSD GSGDKGWWK LKDGQRVTVK PTWLRTAFV TIHKYTGNDK CGAETAILGI 900 

20 NTADGGKLTK KSARPIVPAA NSKVAQYSGD KKTSSGKSIP IGCMEKDGGT VCPNGYVYDK 960 

PVNVRYLDEK KTDGFS TTAD GDAGGSGTFK EGKKPARNNR CFSGKGVRTL LMNDLDSLDI 102 0 

TGPMCGMKRI SWREVFY 103 7 
<212> Type : PRT 
<211> Length : 1037 

25 SequenceName : SEQ ID 250 

SequenceDescription : 



Sequence 



30 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MKHPKQTLIA ALLTTAATAA PLPWTSFSI LGDVAKQIGG ERVSIQSLVG ANQDTHAYHM 60 
TSGDIKKIRS AKLVIi INGLG LEAADIQRAV KQSKVSYAEA TKGIQPLKAE EEGGHHHDHD 12 0 

HDHDHDHEGH HHDHGEYDPH WNDPVLMSA YAQNVAEALI KADPEGKVYY QQRLGNYQMQ 180 

35 LKKLHSDAQA AFNAVPAAKR KVLTGHDAFS YMGKRYHXEF XAPQGVSSEA EPSAKQVAAI 240 
IRQIKREGIK AVFTENIKDT RMVDRIAKET GVNVS GKLYS DALGNAPADT YIGMYRHNIK 3 00 

ALTNAMKQ 3 08 

<212> Type : PRT 
<211> Length : 308 

40 SequenceName : SEQ ID 251 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Streptococcus pyogenes MGAS823 2 
<400> PreSequenceString : 

MKKRILSAVL VSGVTLGAAT TVGAEDLSTK IAKQDSIISN LTTEQKAAQN QVSALQAQVS 60 
SLQSEQDKLT ARNTELEALS KRFEQEIKAL TSQIVARNEK LKNQARSAYK NNETSGYINA 120 
LLNSKSISDV VNRLVAINRA VSANAKLLEQ QKADKVSLEE KQAANQTAIN TIAANMAMAE 180 

50 ENQNTLRTQQ ANLEAATANL ALQLASATED KANLVAQKEA AEKAAAEALA QEQAAKVKAQ 240 
EQAAQQAASV EAAKSAITPA PQATPAAQSS NAIEPAALTA PAAPSARPQT SYDSSNTYPV 3 00 

GQCTWGAKSL APWAGNNWGN GGQWAYSAQA AGYRTGSTPM VGAIAVWNDG GYGHVAVWE 360 
VQSASSIRVM ESNYSGRQYI ADHRGWFNPT GVTFIYPH 398 
<212> Type : PRT 

55 <211> Length : 3 98 

SequenceName : SEQ ID 252 
SequenceDescription : 

Sequence 
60 — 

<213> OrganismName : Streptococcus pyogenes MGAS823 2 
<40 0> PreSequenceString : 

MITIKNPKIL KWLKYVLSAI LSLIILVIII GGLLFTFYIS SAPKLSEAQL KSTNSSLVYD 60 

GNNNL I ADLG SEKRENVTAD SIPINLVNAI TSIEDKRFFN HRGVDLYRIF GAAFHNLTSQ 12 0 

65 TTQGGSTLDQ QLIKLAYFST NESDQTLKRK AQEVWLALQM ERKYTKQEIL TF Y INKVYMG 18 0 

NGNYGMLTAA KSYYGKDLKD LSYAQLALLA GIPQAPSQYD PYLHPEAAQN RRNWLQQMY 240 

MEKHLTKAEY ETAIATPVAE GLQSLQQRST YPKYMDNYLK QVIEEVKKET NKDIFTAGLK 3 00 
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VYTNIIPDAQ QTLYNIYHSG DYVYYPDQDF QVASTIVDVT NGHVIAQLGG RNQDENVSFG 3 60 

TNQAVLTDRD WGSTMKPITA YAPAIESGVY TSTAQSTNDS VYYWPGTTTQ LFNWDLRYNG 420 

WMTIQAAIML S RNVPAVRAL EAAGLDYARS FLSSLGINYP EMHYSNAISS NNSSSDKKYG 480 

ASSEKMAAAY AAFANGGIYH KPRYVNKVEF SDGTSKTFDE KGKRAMKETT AYMMTDMLKT 540 

5 VLTYGTGTAA AIPGVAQAGK TGTSMYTDEE LAKIGEKYGL YPDYVGTLAP DENFVGFTKR 600 

YAMAVWTGYK NRLTPVYGSS LEIASDVYRS MMTYLTNGYS EDWTMPNGLY RSGGFLYLSG 660 

TYASNTDYTN SVYNNLYSNN TTTASSQTTS DDTSSSNDTS NSTNTDNNGS HPSTDDKKTT 720 

H 721 
<212> Type : PRT 
10 <211> Length : 721 

SequenceName : SEQ ID 253 

SequenceDescription : 



Sequence 

15 

<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

MIITKKSLFV TSVALSLAPL VTAQAQEWTP RSVTEIKSEL VLVDNVFTYT VKYGDTLSTI 60 
AEAMGI DVHV LGDINHIANI DLIFPDTILT ANYNQHGQAT TLTVQAPASS PASVSHVPSS 120 

20 EPLPQASATS QSTVPMAPSA TPSDVPTTPL ASAKPDSFVT ASSELTSSTN DVSTELSSES 180 
QKQPEVSQEA VPTPKAAETT EVEPKTDISE DPTSANRPVP NESASEEASS AAPAQAPAEK 240 
EETSQMLTAP AAQKAVADTT SVATSNGLSY APNHAYNPMN AGLQPQ TAAF KEEVASAFGI 3 00 

TSFSGYRPGD PGDHGKGLAI DFMVPVSSTL GDQVAQYAID HMAERGISYV IWKQRFYAPF 360 
ASIYGPAYTW NPMPDRGSIT ENHYDHVHVS FNA * 393 

25 <212> Type : PRT 

<211> Length : 393 

SequenceUame : SEQ ID 254 
SequenceDescription : 

30 Sequence 



<213> OrganismName r Streptococcus pyogenes MGAS8232 
<400> PreSequenceString r 

MKKKILLMMS LISVFFAWQL TQAKQVLAEG KVKWTTFYP VYEFTKGVIG NDGDVSMLMK 60 
35 AGTEPHDFEP STKDIKKIQD ADAFVYMDDN METWVSDVKK SLTSKKVTIV KGTGNMLLVA 120 
GAGHDHHHED ADKKHEHNKH SEEGHNHAFD PHVWLSPYRS ITWENIRDS LSKAYPEKAE 180 
NFKANAATYI EKLKELDKDY TAALSDAKQK SFVTQHAAFG YMALDYGLNQ ISINGVTPDA 240 
EPSAKRIATL SKYVKKYGIK YIYFEENASS KVAKTLAKEA GVKAAVLS PL EGLTKKEMKA 300 
GQDYFTVMRK NLETLRLTTD VAGKEILPEK DTTKTVYNGY FKDKEVKDRQ LSDWSGSWQS 3 60 

40 VYPYLQDGTL DQVWDYKAKK SKGKMTAAEY KDYYTTGYKT DVEQIKINGK KKTMTFVRNG 420 
EKKTFTYTYA GKEILTYPKG NRGVRFMFEA KEPNAGEFKY VQFSDHAIAP EKAEHFHLYW 480 
GGDSQEKLHK ELEHWPTYYG SDLSGREIAQ EINAH 515 
<212> Type : PRT 
<211> Length : 515 
45 SequenceName : SEQ ID 255 

SequenceDescription : 

Sequence 



50 <213> OrganismName : Streptococcus pyogenes MGAS823 2 
<40 0> PreSequenceString : 

MKKFHRFLVS GVILLGFNGL VPTMPSTLIS QQENLVHAAV LGDNYPSKWK KGNGIDSWNM 60 
YIRQCTSFAA FRLSSANGFQ LPKGYGNACT WGHIAKNQGY PVNKTPSIGA IAWFDKNAYQ 120 
SNAAYDHVAW VAD I RGDTVT IEEYNYNAGQ GPERYHKRQI PKSQVSGYIH FKDLSSQTSH 180 

55 SYPRQLKHIS QASFDPSGTY HFTTRLPVKG QTSIDSPDLA YYEAGQSVYY DKWTAGGYT 240 
WLSYLSFSGN RRYIPIKEPA QSWQNDNTK PSIKVGDTVT FPGVFRVDQL VNNLIVNKEL 3 00 

AGGDPTPLNW IDPTPLDETD NQGKVLGNQI LRVGEYFTVT GSYKVLKIDQ PSNGIYVQIG 360 
SRGTWVNADK ANKL 374 
<212> Type : PRT 

60 <211> Length : 374 

SequenceName : SEQ ID 256 
SequenceDescription : 

Sequence 

65 

<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 
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MLKFTSNILA TSVAETTQVA PGGCCCCCTT CCFSIATGSG NSQGGSGSYT PGK 



<212> Type : PRT 
<211> Length : 53 
5 SequenceName : SEQ ID 257 

SequenceDe script ion : 



Sequence 



10 <213> OrganismName : Streptococci 
<400> PreSequenceString : 
MGES YSVEAV LTAVDKTFGK TLQSAIRSIE 
QAISAMTRTV SSGLGSMLGE MNSSAKAWKT 
YS AS DMAS T Y AQtiAAVGVKD TGKLVKAFGG 

15 QDFRIMLEQT PAGMAKVAKS MGKNLDELVA 
EFKTVDQAID GMREGLSNKL QPAFEKVNQF 
INIDKIVSNI SSAVSSVTSK VKEFWDGFKQ 
KTFGATVGGI VKHVSNFAKA VSDVLGKMDP 
SFLDKIGSKF GLFGNKAKEG TDKASNGARR 

20 GVGIKTALSG IPPYH 
<212> Type : PRT 
<211> Length : 495 

SequenceName : SEQ ID 258 
SequenceDescription : 

25 

Sequence 



is pyogenes MGAS823 2 

GLEKRSTGFS SVSQKASSMF KSMLGANLAG 60 

FDANLADIGF GKKQI LAVKT AMQDYATKTI 12 0 

LAASAENPKQ AMKSISQQMT QAVGRPTVAW 18 0 

DIQAGRVKTS DFLEAVKKAG NDKSFQKMAT 240 

GIRAIEAIGK QLDKVDFSKF ASNLGKFLEG 3 00 

TGAISAFSGA LQSVWGALKN VASAMSGGNW 36 0 

GRLRSWIATF AAVAGGFKLF EKLTGQSVIG 42 0 

SGGIISQIFS GLGNIVKSAG TAISTAAKGI 4 80 

495 



<213> Organi smName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

30 MKKGFFLMVM WSLVMIAGC DKSANPKQPT QGMSWTSFY PMYAMTKEVS GDLNDVRMIQ 60 
SGAGIHS FEP SVNDVAAIYD ADLFVYHSHT LEAWARDLDP NLKKS KVDVF EASKPLTLDR 12 0 

VKGLEDMEVT QGIDPATLYD PHTWTDPVLA GEEAVNIAKE LGRLDPKHKD SYTKNAKAFK 180 
KEAEQLTEEY TQKFKKVRSK TFVTQHTAFS YLAKRFGLKQ LGISGISPEQ EPSPRQLKEI 240 
QDFVKEYNVK TIFAEDNVNP KIAHAIAKST GAKVKTLSPL EAAPSGNKTY LENLRANLEV 300 

35 LYQQLK 3 06 

<212> Type : PRT 
<211> Length : 306 

SequenceName : SEQ ID 259 
SequenceDescription : 

40 

Sequence 



<213> OrganismName : Streptococci 
<400> PreSequenceString : 

45 MEKKQRFSLR KYKSGTFSVL IGSVFLMMTT 
LSSAESKSQD TSQITPKTNR EKEQPQGLVS 
PVNTDVHDWV KTKGAWDKGY KGQGKWAVI 
QKAAGINYGS WINDKWFAH NYVENSDNIK 
YRPQSTQAPK ETVIKTEETD GSHDIDWTQT 

50 FLGIAPEAQV MFMRVFANDV MGSAESLFIK 
LMEAIEKAKK AGVSVWAAG NERVYGSDHD 
WV I QRLMTVK ELENRADLNH GKAIYSESVD 
DVKDKIALIE RDPNKTYDEM IALAKKHGAL 
SHEFGKAMSQ LNGNGTGSLE FDSWSKAPS 

55 IYSTYNDNHY GSQTGTSMAS PQIAGASLLV 
HVNPETKTTT SPRQQGAGLL NIDGAVTSGL 
NKDKTLRYDT ELLTDHVDPQ KGRFTLTSRS 
ELTKQMSNGY YLEGFVRFRD SQDDQLNRVN 
FYFDESGPKD DIYVGKHFTG LVTLGSETNV 

60 GNPVLAISPN GDNNQDFAAF KGVFLRKYQG 
NSDIRFAKST TLLGTAFSGK SLTGAELPDG 
PVLSQATFDP ETNRFKPEPL KDRGLAGVRK 
KTFVERQADG SFILPLDKAK LGDFYYMVED 
NYQTKETLKD NLEMTQSDTG LVTNQAQLAV 

65 AFKGLKNNVY NDLTVNVYAK DDHQKQTPIW 
YQYWTYRDE HGKEHQKQYT ISVNDKKPMI 
EVFYLAKKNG RKFDVTEGKD GITVSDNKMY 



is pyogenes MGAS82 32 

TVAADELSTM SEPTITNHTQ QQAQHLTNTE 60 

EPTTTELADT DAAPMANTGP DATQKSASLP 12 0 

DTGIDPAHQS MRISDVSTAK VKS KEDMLAR 18 0 

ENQFEDFDED WENFEFDAEA EPKAIKKHKI 24 0 

DDDTKYESHG MHVTGIVAGN SKEAAATGER 3 00 

A I EDAVALGA DVINLSLGTA NGAQLSGSKP 3 60 

DPLAINPDYG LVGSPSTGRT PTSVAAINSK 420 

FKNIKDSLGY DKSHQFAYVK ESTDAGYKAQ 48 0 

GVL I FNNKPG QSNRSMRLTA NGMGIPSAFI 540 

QKGNEMNHFS NWGLTSDGYL KPD I TAP GGD 600 

KQYLEKTQPN LPKEKIADIV KNLLMSNAQI 660 

YVTGKDNYGS ISLGNITDTM TFDVTVHNLS 720 

LKTYQGGEVT VPANGKVTVR VTMDVSQFTK 78 0 

IPFVGFKGQF ENLAVAEESI YRLKSQGKTG 840 

STKTISDNGL HTLGTFKNAD GKFILEKNAQ 900 

LKASVYHASD KEHKNPLWVS PESFKGDKNF 960 

YYHYWSYYP DWGAKRQEM TFDMILDRQK 1020 

DSVFYLERKD NKPYTVTIND SYKYVSVEDN 1080 

FAGNVAIAKL GDHLPQTLGK TPIKLKLTDG 1140 

VHRNQPQSQL TKMNQDFFIS PNEDGNKDFV 1200 

SSQAGASASA IESTAWYGIT ARGSKVMPGD 1260 

TQGRFDTING VDHFTPDKTK ALGSSGIVRE 1320 

IPKNPDGSYT ISKRDGVTLS DYYYLVEDRA 1380 
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GNVSFATLRD LKAVGKDKAV VNFGLDLPVP EDKQIVNFTY LVRDADGKPH ENLEYYNNSG 1440 

NSLILPYGKY TVELLTYDTN AAKLESDKIV SFTLSADNNF QQVTFKMTML* ATSQITAHFD 15 00 

HLLPEGSRVS LKTAQGQLIP LEQSLYVPKA YGKTVQEGTY EWVSLPKGY"" RIEGNTKVNT 1560 

LPNEVHELSL RLVKVGDASD S TGDHKVMS K NNSQALTAFA TPTKTTTSATT AKALPSAGEK 162 0 

5 MGLKLR I VGL VLLGLTCVFS RKKSTKD 1647 



<212> Type : PRT 

<211> Length : 1647 

SequenceName : SEQ ID 260 
SequenceDe script ion : 

10 

Sequence 



<213> OrganismName : Treponema pallidum subsp. pallidum str. Nichols 
<400>, PreSequenceString :< ... . . . 

15 MMRSLFSGVS GMQNHQTRMD VIGNNVANVN TTGFKRGRVN FQDLISQQLS - AAARPNEEVG 60 
GVNPKEVGLG VLIASIDTVH TQGALQTTGI NTDVSIQGSG FFVLKSGEKT7 FFTRAGAFGV 12 0 

DNAGTLVNPA NGMRVQGWMA QDVAGERLIN SSAQTQDLVI PIGQKIDAQC2 TSTVHYACNL 180 
DKRLPELAAD ANEADVRKST WTTDFQVYDS FGQQHTLQIN FSRVPGTNNC^ WQATVAVDPG 240 
TEVDTQTRVG VGTSDGAANT FIVNFDNFGH LASVTDTAGN VTGPTGQVLL. EASYDWGAN 3 00 

20 PDDAGQVTRH AFTLNLGEIG TARNT I TQFA ERSTTKAYRQ DGYAMG YL EKT FKIDQSGVIT 3 60 

GVYSNGVSQD IGQLALAGFA NQGGLEKAGE NTYVQSNNSG IANISTSGVM GKGKLIAGTL 420 
EMSNVDLTDQ FTDMI ITQKG FQAGAKTIQT SDTMLDTVLS LKR 463 
<212> Type : PRT 
<211> Length : 463 

25 SequenceName : SEQ ID 261 

SequenceDe script ion : 

Sequence 



30 <213> OrganismName : Treponema pallidum subsp. palliduxm str. Nichols 



<400> PreSequenceString : 

MGCMRWGSVL CVWGVGASG GVLGQEFSPK LTGSATLEWG ISYGKGVGSI^ GQAPGAVMGT 60 

GPYNLKHGFR TTNTVGVSFP LVMRTTHTRR GQHPALYAEL KVADLQADLS QGKAGFAVKR 120 

KG KVEATLH C YGAYLTIGKN PTFLTNFARL WKPWVTAQYQ EDAVQYAPG^ GGLGGKVGYR 180 

35 AQDIGGSGVS LDVGFLS FAS NGAWDSTDPT HSKYGFGADL KLMYARAGHE 3 LCTVELASNV 240 

TLEDGYLIGA QKDANNQNKD KLLWNVGGRL TLEPGAGFRF SFALDAGNQE2 QSAQDFQNRT 3 00 

QRAQSELTAL SNNLFQGESQ KQEAWVTQW QQATQTVTAG VRSALESRGT? TYINALEAVQ 3 60 

PNPAKPTGKV VQNLHTPQGS PPNLPPLPAL PAFSLMGQVL LQYDAEQWK1 GFEQVQTQIV 42 0 

TEINQKVQAA VAKNNANMQA VGGSLGDTAR MVGEALIKQQ LSRKQNSILT7 MVSVQDEVKQ 480 

40 DLADLVPMMR TEITAFFASV QQHITEEVKK KTDALNAGQQ IRQAIQNLR^ SAWRAFLMGV 540 

SAVCLYLDTY NVAFDALFTA QWKWLSSGIY FATAPANVFG TRVLDNTIAS CGDFAGFLKL 600 

ETKSGDPYTH LLTGLDAGVE TRVYI PLTHD LYKNNNGNPL PSGGSSGHIG LPWGKAWCS 660 

YRIPVQDYGW VKPSVTVHAS TNRAHLNAPA AGGAVGATYL TKEYCAQLR^ GISASLIEKT 720 

VFSLDWEQGM LSDVPYLLVS ECLTQGIGRI VCGVTLSW 758 

45 <212> Type : PRT 

<211> Length : 758 

SequenceName : SEQ ID 2 62 
SequenceDescription : 



50 Sequence 



<213> OrganismName : Treponema pallidum subsp. pallidL3.m str. Nichols 



<4 00> PreSequenceString : 

MGRQVMQAGV LAGMVCAASG YAGVLTPQVS GTAQLQWGIA FQKNPRTGPG. KHTHGFRTTN 60 

55 SLTISLPLVS KHTHTRRGEA RSGVWAQLQL KD LAVE LAS S KSSTALSFTK PTASFQATLH 120 

CYGAYLTVGT SPSCWNFAQ LWKPFVTRAY S EKDTR YAP G FSGSGAKLGVT QAHNVGNS GV 180 

DVDIGFLSFL SNGAWDSTDT THSKYGFGAD ATLSYGVDRQ RLL TLE LAGIS3* ATLDQNYVKG 24 0 

TED S KNENKT ALLWGVGGRL TLEPGAGFRF SFALDAGNQH QSNAHAQTQE3 RAILKAREVF 3 00 

RRVEGKLVQN LPNIMMPPGI TEQTTLIEMV GLAALIAEGT LGSAIQTVIA AGALAALVSQ 3 60 

60 LVPNIEQGVR DVFRSSDPRV VTAKLLAFLE RAPMNALNID ALLRMQWKWELj SSGIYFATAG 42 0 

TNIFGKRVFA TTRAHYFDFA GFLKLETKSG DPYTHLLTGL NAGVEARVYH PLTYIRYRNN 480 

GGYELNGAVP PGTINMPILG KAWCSYRIPL GSHAWLAPHT SVLGTTNRF1SI IINPAGNLLN 540 

ERALQYQVGL TFSPFEKVEL SAQWEQGVLA DAPYMGIAES IWSERHFGTHj VCGMKVTW 598 



65 <212> Type : PRT 

<211> Length : 598 

SequenceName : SEQ ID 2 63 
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SequenceDescription : 
Sequence 

5 <213> OrganismName : Treponema pallidum subsp . pallidum str. Nichols 
<400> PreSequenceString : 

MGRQVMQAGV LAGMVCAASG YAGVLTPQVS GTAQLQWGIA FQKNPRTGPG KHTHGFRTTN 60 

SLTISLPLVS KHTHTRRGEA RSGVWAQLQL KDLAVELASS KSSTALSFTK PTASFQATLH 12 0 

CYGAYLTVGT SPSCWNFAQ LWKPFVTRAY SEKDTRYAPG FSGSGAKLGY QAHNVGNSGV 180 

10 DVDIGFLSFL SNGAWDSTDT THS KYGFGAD ATLSYGVDRQ RLLTLELAGN ATLDQNYVKG 240 

TEDSKNENKT ALLWGVGGRL TLEPGAGFRF SFALDAGNQH QSNAHAQTQE RAILKAREVF 3 00 

RRVE GKLVQN LPNIMMPPGI TEQTTLIEMV GLAALIAEGT LGSAIQTVLA AGALAALVSQ 3 60 

LVPNI EQGVR DVFRSSDPRV VTAKLLAFLE RAPMNALNID ALLRMQWKWL SSGIYFATAG 420 
TNIFGKRVFA TTRAHYFDFA GFLKLETKSG DPYTHLLTGL NAGVEARVYI PLTYI RYRNN . . 480 

15 GGYELNGAVP PGTINMPILG KAWCSYRIPL GSHAWLAPHT SVLGTTNRFN IINPAGNLLN ' 54 0 

ERALQYQVGL TFSPFEKVEL SAQWEQGVLA DAPYMGIAES IWSERHFGTL VCGMKVTW 598 

<212> Type : PRT 
<211> Length : 598 
20 SequenceName : SEQ ID 264 

SequenceDescription : 

Sequence 



55 



25 <213> OrganismName : SARS coronavirus Frankfurt 1 
<40 0> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYI SDAFSL DVSEKSGNFK 18 0 

30 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

35 YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVL TP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLIiRS TS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

40 GFNFSQILPD PLECPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD M I AAYTAAIi V SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

45 GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKU 1140 

HTSPDVDFGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 12 0 0 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
50 <211> Length : 1255 

SequenceName : SEQ ID 265 
SequenceDescription : 



p Sequence 



<213> OrganismName : SARS coronavirus HSR 1 
<40 0> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

60 TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

65 LAWNTRN I DA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 
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VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ 

HTVSLLRSTS QKSIVAYTMS LGADSSIAYS 

NMYICGDSTE CANLLLQYGS FCTQLNRALS 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL 

5 TVLPPLLTDD MIAAYTAALV SGTATAGWTF 

NQKQIANQFN KAISQIQESL TTTSTALGKL 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV 

GTSWFITQRN FFSPQIITTD NTFVS GNCDV 

10 HTSPDVDLGD ISGINASWN IQKEIDRLNE 
GFIAGLIAIV MVTILLCCMT SCCSCLKGAC 

<212> Type : PRT 
<211> Length z 1255, 
15 SequenceName : SEQ ID 266 

SequenceDescription : 

Sequence 



20 <213> Organi smName : SARS coronavirus ZJ01 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

P FYS NVTGFH TINHTFGNPV I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 12 0 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 18 0 

25 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

30 YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDXSPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADS S IAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 78 0 

35 GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD L I CAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

40 GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNES L I DLQELGKYEQ YIKWPWYVWL 12 0 0 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
45 <211> Length : 1255 

SequenceName : SEQ ID 267 
SequenceDescription : 

Sequence 

50 

<213> Organi smName : SARS coronavirus TW1 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYS NVTGFH T INHTFGNP V I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

55 TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNI T NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

60 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADS S IAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

65 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LI CAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 



TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

ADAGFMKQYG ECLGDINARD LI CAQKFNGL 84 0 

GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

PSQERNFTTA PAI CHEGKAY FPREGVFVFN 10 8 0 

VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

VAKNLNES L I DLQELGKYEQ YIKWPWYVWL 12 00 

SCGSCCKFDE DDSEPVLKGV KLHYT * 12 55 
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NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

5 HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 



<212> Type : PRT 
<211> Length : 1255 
10 SequenceName : SEQ ID 268 

SeguenceDe script ion : 

Sequence 

15 <213> OrganismName : SARS coronavirus CUHK-SulO 
<400> PreSequenceString : 

, MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MI FDNAFNCT FEYISDAFSL DVSEKSGNFK 18 0 

20 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKISNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDD FMG CV 420 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

25 YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPGSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

30 GFNFSQILPD PLKPTKRSFI EDLL FNKVTL ADAGFMKQYG ECLGD I NARD LICAQKFNGL 840 

TVBPPLIiTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

35 GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 12 00 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 



<212> Type : PRT 
40 <211> Length : 1255 

SequenceName : SEQ ID 269 
SequenceDescription : 

Sequence 

45 

<213> OrganismName : SARS coronavirus Urbani 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

50 TNWIRACNF ELCDNPFFAV SKPMGTQTHT M I FDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKISNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

55 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR I YSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

60 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD L I CAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 102 0 

65 RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 
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GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
<211> Length : 1255 
5 SequenceName : SEQ ID 270 

SequenceDescription : 

Sequence 



10 <213> OrganismName : SARS corpnavirus 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK V 18.0 

15 HLREFVFKNK DGFLYVYKGY QPIDWRDLP S GFNTLKP I F KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQ I APG QTGVIADYNY KLPDDFMGCV 420 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

20 YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCAFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCL IGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

25 GFNFSQILPD PLKPTKRSFI EDLL FNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 84 0 

TVLPPLLTDD M I AAYTAAL V SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 90 0 

NQKQIANQFN KAISQIQESL TTTS TALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

30 GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTS PDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type r PRT 
35 <211> Length r 1255 

SequenceName : SEQ ID 271 
SequenceDescription : 

Sequence 
40 

<213> OrganismMame : SARS coronavirus Tor2 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 12 0 

45 TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP S GFNTLKP IF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 30 0 

QTSNFRWPS GDWRFPNIT NLCP FGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQ I AP G QTGVIADYNY KLPDDFMGCV 42 0 

50 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 48 0 

YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCAFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCL IGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

55 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 
GFNFSQILPD PLKPTKRSFI EDLL FNKVTL ADAGFMKQYG ECLGDINARD L I CAQKFNGL ' 84 0 

TVLPPLLTDD M I AAYTAAL V SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 90 0 

NQKQIANQFN KAISQIQESL TTTS TALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

60 RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 108 0 

GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTS PDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

65 <212> Type : PRT 

<211> Length : 1255 

SequenceName : SEQ ID 272 
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SequenceDescription : 
Sequence 

5 <213> OrganismName : SARS coronavirus GD01 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFDNPV IPFKDGIYFA ATE KSNWRG WVFGSTMNNK SQSVIIINNS 120 

TNW I RACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

10 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFLP 240 

AQDTWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS RD WRF PNI T NLCPFGEVFN ATKFPSVYAW ERKRI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

LAWNTRNIDA T S TGNYN YKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 . 

15 YGFYTTTGIG YQPYRWVLS YELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADS S I AYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKDFG 780 

20 GFNFSQILPD PLKSTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV S GTATAGWT F GAGAALQ I P F AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 10 80 

25 GTSWFITQRN FFSPQIITTD NT FVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFIOSf 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
30 <211> Length : 1255 

SequenceName : SEQ ID 273 
SequenceDescription : 

Sequence 
35 

<213> OrganismName : SARS coronavirus CUHK-W1 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFDNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 12 0 

40 TNW IRA CNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 18 0 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 24 0 

AQDTWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

45 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

50 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD M I AAYTAALV S GTATAGWT F GAGAALQ I PF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

55 RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

60 <212> Type : PRT 

<211> Length : 1255 

SequenceName : SEQ ID 274 
SequenceDescription : 

65 Sequence 



<213> OrganismName : SARS coronavirus BJ01 
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<400> PreSequericeString : 
MFIFLLFLTL TSGSDLDRCT T FDD VQAPN Y 
PFYSNVTGFH TINHTFDNPV IPFKDGIYFA 
TNWIRACNF ELCDNPFFAV SKPMGTQTHT 
5 HLREFVFKNK DGFLYVYKGY QPIDWRDLP 
AQDTWGTSAA AYFVGYL KPT TFMLKYDENG 
QTSNFRWPS GDWRFPNIT NLCPFGEVFN 
FSTFKCYGVS ATKLNDLCFS NVYADSFWK 
LAWNTRNI DA TSTGNYNYKY RYLRHGKLRP 

10 YGFYTTTGIG YQPYRVWLS FELLNAPATV 
SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI 
VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ 
HTVSLLRSTS QKSIVAYTMS L GAD S S I AYS 
NMYICGDSTE CANLLLQYGS FCTQLNRALS 

15 GFNFSQILPD PLKPTKRSFI EDLLFNKVTL 
TVLPPLLTDD MIAAYTAALV SGTATAGWTF 
NQKQIANQFN KAISQIQESL TTTSTALGKL 
DILSRLDKVE AEVQIDRLIT GRLQSLQTYV 
RVDF CGKGYH LMSFPQAAPH GWFLHVTYV 

20 GTSWFITQRN FFSPQIITTD NTFVSGNCDV 
HTSPDVDLGD ISGINASWN IQKEIDRLNE 
GFIAGLIAIV MVTILLCCMT SCCSCLKGAC 



TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

CGPKLSTDLI KNQ CVNFNFN GLTGTGVLTP 540 

LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

TQAGCLIGAE HVDTSYECDI P I GAG I CAS Y 660 

NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

GIAAEQDRNT . REVFAQVKQM YKTPTLKYFG 7 80 

ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

GAGAALQ I P F AMQMAYRFNG I GVTQNVL YE 9 00 

QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

TQQLIRAAEI RASANLAATK MS ECVLGQS K 102 0 

PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

SCGSCCKFDE DDSEPVLKGV KLHYT 1255 



<212> Type : PRT 
25 <211> Length : 1255 

SequenceName : SEQ ID 275 
SequenceDescription : 



Sequence 

30 

<213-> OrganismName : SARS coronavirus 
<400> PreSequenceString : 

SGFRKMAFPS GKVEGCMVQV TCGT TTLNGL WLDDTVYCPR HVICTAEDML NPNYEDLLIR 60 
KSNHSFLVQA GNVQLRVIGH SMQNCLLRLK VDTSNPKTPK YKFVRIQPGQ TFSVLACYNG 120 

35 SPSGVYQCAM RPNHTIKGSF LNGSCGSVGF NIDYDCVSFC YMHHMELPTG VHAGTDLEGK 180 
FYGPFVDRQT AQAAGTDTTI TLNVLAWLYA AVINGDRWFL NRFTTTLNDF NLVAMKYNYE 240 
PLTQDHVDIL GPL SAQTGI A VLDMCAALKE LLQNGMNGRT ILGSTILEDE FTPFDWRQC 300 
SGVTFQ 306 
<212> Type : PRT 

40 <211> Length : 306 

SequenceName : SEQ ID 276 
SequenceDescription : 



Sequence 
45 

<213> Organ ismName : SARS coronavirus 
<400> PreSequenceString : 

AIASEFSSLP SYAAYATAQE AYEQAVANGD S EWLKKLKK SLNVAKSEFD RDAAMQRKLE 60 

KMADQAMTQM YKQARSEDKR AKVTSAMQTM LFTMLRKLDN DALNNIINNA RDGCVPLNII 120 
50 PLTTAAKLMV WPDYGTYKN TCDGNTFTYA SALWEIQQW DADSKIVQLS EINMDNSPNL 18 0 

AWPLIVTALR ANSAVKLQ 198 

<212> Type : PRT 

<211> Length : 198 

SequenceName : SEQ ID 277 
55 SequenceDescription : 



Sequence 



<213> OrganismName : SARS coronavirus 
60 <400> PreSequenceString : 

AGNATE V PAN STVLSFCAFA VDPAKAYKDY LASGGQPITN CVKMLCTHTG TGQAITVTPE 60 
ANMDQESFGG ASCCLYCRCH IDHPNPKGFC DLKGKYVQIP TTCANDPVGF TLRNTVCTVC 120 
GMWKGYGCSC DQLREPLMQ 139 
<212> Type : PRT 
65 <211> Length : 139 

SequenceName : SEQ ID 278 
SequenceDescription -. 
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Sequence 



<213> Organi smName : SARS coronavirus 
5 <400> PreSequenceString : 

NNELSPVALR QMSCAAGTTQ TACTDDNALA YYNNSKGGRF VLALLSDHQD LKWARFPKSD 60 
GTGTIYTELE PPCRFVTDTP KGPKVKYLYF IKGLNNLNRG MVLGSLAATV RLQ 113 

<212> Type : PRT 
10 <211> Length : 113 

SequenceNarae : SEQ ID 279 
SequenceDe script ion : 



15 Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MNKIFKVIWN PATGNYTVTS ETAKSRGKKS GRSKLLISAL VAGGMLSSFG ALANAGNDNG 60 

20 QGVDYGSGSA GDGWVAIGKG AKANTFMNTS GSSTAVGYDA I AE GQ YS S AI GSKTHAIGGA 120 

SMAFGVSAIS EGDRS I ALGA SSYSLGQYSM ALGRYSKALG KLSIAMGDSS KAEGANAIAL 18 0 

GNATKATEIM SIALGDTANA SKAYSMALGA SSVASEENAI AIGAETEAAE NATAIGNNAK 240 

AKGTNSMAMG FGSLADKVNT I ALGNGS QAL ADNAIAIGQG NKADGVDAIA LGNGSQSRGL 3 00 

NTIALGTASN ATGDKSLALG SNSSANGINS VALGADS IAD LDNTVSVGNS SLKRKIVNVK 3 60 

25 NGAIKSDSYD AIKfGSQLYAI SDSVAKRLGG GAAVDVDDGT VTAPTYNLKN GSKNNVGAAL 420 

AVLDENTLQW DQTKGKYSAA HGTSSPTASV ITDVADGTIS ASS KDAVNGS QLKATNDDVE 480 

ANTANIATNT SNIATNTANI ATNTTNITNL TDSVGDLQAD ALLWNETKKA FSAAHGQDTT 540 

SKITNVKDAD LTADSTDAVN GSQLKTTNDA VATNTTNIAN NTSNIATNTT NISNLTETVT 600 

NLGEDALKWD KDNGVFTAAH GTETTSKITN VKDGDLTTGS TDAVNGSQLK TTNDAVATNT 660 

30 TNIATNTTNI SNLTETVTNL GEDALKWDKD NGVFTAAHGN NTASKITNIL DGTVTATSSD 720 

AINGSQLYDL SSNIATYFGG HASVNTDGVF TGPTYKIGET NYYNVGDALA AINSSFSTSL 7 80 

GDALLWDATA GKFSAKHGTN GDASVITDVA DGEISDSSSD AVMGS QLHGV SSYWDALGG 840 

GAEVNADGTI TAPTYTIANA DYDNVGDALN AIDTTLDDAL LWDADAGENG AFSAAHGKDK 900 

TASVITNVAN GAISAASSDA INGSQLYTTN KYIADALGGD AEVNADGT I T APTYTIANAE 960 

35 YNNVGDALDA LDDNALLWDE TANGGAGAYN ASHDGKASII TNVANGS I S E DSTDAVNGSQ 1020 

LNATNMMIEQ NTQIINQLAG NTDATYIQEN GAGINYVRTN DDGLAFNDAS AQGVGATAIG 10 80 

YNSVAKGDSS VAIGQGSYSD VDTGIALGSS SVSSRVIAKG SRDTSITENG WIGYDTTDG 1140 

ELLGALSIGD DGKYRQIINV ADGS EAHDAV TVRQLQNAIG AVATTPTKYF HANS TEED SL 1200 

AVGTDSLAMG AKTIWGDKG IGIGYGAYVD ANALNGIAIG SNAQVIHVNS IAIGMGSTTT 1260 

40 RGAQTNYTAY NMDAPQNSVG EFSVGSADGQ RQITNVAAGS ADTDAVNVGQ LKVTDAQVSQ 13 20 

NTQSITNLDN RVTNLDSRVT NIENGIGDIV TTGS TKYFKT NTDGVDASAQ GKDSVAIGSG 13 8 0 

S I AAADNSVA LGTGSVATEE NTISVGSSTN QRRI TNVAAG KNATDAVNVA QLKSSEAGGV 1440 

RYDTKADGSI DYSNITLGGG NGGTTRISNV SAGVNNNDW NYAQLKQSVQ ETKQYTDQRM 1500 

VEMDNKLSKT ESKLSGGIAS AMAMTGLPQA YTPGASMASI GGGTYNGESA VALGVSMVSA 1560 

45 NGRWVYKLQG STNSQGEYSA ALGAGIQW 15 88 



<212> Type : PRT 

<211> Length : 1588 

SequenceName : SEQ ID 280 
SequenceDescription : 

50 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

55 MPASAVGALG EASY TVT ANV TDSAGMSNSA SHNVQVNTAL PGVTINPVAT DDI INAAESG 60 

NAQTISGQVT GAAAGDTVTV TLGGKTYTAT VQGNLSWSVD VPAADIQAIG NGNLTVNASV 120 

TNGVGNTGSG SRDITIDANL PGLRVDTVAG DDWNSIEHA QALVITGSSS GLAAGAALTV 180 

VINTVTYAAT VLADGTWSVG VPAADVSNWP AGTVNITVSG TNTAGTTSTI THPVTVDLAA 240 

VAISINTVSG DDVINAAEKG ADLTLSGSTS GVEVGQTVTV TFGGKTYTAT VAGDGSWTTT 3 00 

60 VPAADLSVLR DGDATVQASV STINGNTASA THAYSVDATA PTLAINTIAT DD I LNAAE AG 360 

NPLTISGSST AEAGQTVTVT LNGVTYSGSV QADGSWSVSL PTADLSNLTA SQYTVSASVS 420 

DKAGNPASAN HGLAVDL TVP VLTINTVSGD DIINAAEHGQ ALVISGSSTG GEAGDVITVT 480 

LNSKTYTTML DASGNWSVGV PAADVTALGS GPQTITAAIT DAAGNSDDAS RTVTVNLAAP 540 

TIGINTIATD DVIKATEKGA DLQITGTSNQ PAGTTITVTL NGQNYTATTD SNGNWSATVP 600 

65 ASAVSALGEA NYTVTANVTD TAGNSNSASH NVLVNSALPA VTINAVATDD I INAAESGNA 660 

QTISGQVTGA AQGDTVTVTL GGNTYTATVQ SNLSWSVDVP AADIQALGNG DLTVNASVTN 720 

GVGNTGSGSR DITIDANLPG LRVDTVAGDD VINSIEHNQA LVITGSSSGL TAGTALTVEI 780 
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NNVTYGATVL ADGTWSLGVP AVDVSNWPAG 
ITINTLSGDD VINAVEKGET LWSGSTSGV 
PADLAALPDG AGNVQASVSN INGNSAQADR 
ITISGTTTAQ AGQTLTVTLN NNTYQTTVLA 
5 AGNPASADHA LWDITAPDL T INTVAGDD I 
GKNYTTTLDA SGNWSVGIPA ADVTALATGS 
TINTVSGDDI INAAEIWAQ TISGQVTGTA 
ANVLQALGNG ELTISASLTN SANNTGTATH 
LVITGSSSGL AAGAALTWI NSVTYGATVL 

10 TAGTTTSISH PVTVDLAAVA ITINTLSTDD 
GGKS YTTTVA ADNTWGLTIP AVDVATLPDG 
VTINTIATDD I LMAAEAGS A LTISGTSTAE 
GDLASLTASS YTVNASVSDK ARNSASATHN 
IISGSATGAT TGNTVSVTIG TTTYTTVLDA 

15 AGNS GTASHT VTVALGAPVL AINTIAVDDI 
QNYTTTADAS GNWSVTVPAS RVSALGEATY 
INWATDDII NAAEAGVEQT ISGQVTGAAA 
ALQELGNGEL TISASVTNSV GNTGNGTREI 
ITGSSSGLAA GSNVTLTING QTYVAAVLAD 

20 GNPVSVTHPV TVDLSAVAVS INAITADDVI 
KTYSATVAAN GSWSTSVPAA DMAALRDGDA 
INTIAGDDIL NAAEAGAALT I TGS S TAEAG 
LSTLTASNYT VNAAVS DKAG NPASVNHNLT 
SGSATGAATG STVTVTIGTN TFTTVLDASG 

25 NSGSATHQVT VNTGLPTITF NAISGDNILN 
NYSATTDASG NWTLTVPVSD LAALGQANYT 
NTVAGDDIIN AAEAGAD QT I SGWTRAAAG 
LQAL GNGDLT ITASVTNANG NTGSGTRDIT 
TGGSSGLNAG AVLTVTINSV AYSATVQADG 

30 NPVSVSHPFT VDLTAVAISI NTVASDDVIN 
TYTASVAANG SWSVNVPAAD LATLPEGAAN 
NTIASDDILN AAEAGSPLTI SGTSTAETGQ 
GALNASNYTV SATVNDKAGN PGSASHNLAV 
GTS S GGEAGD WSWLNGKT YTTTLDASGN 

35 SDDASRTVTV SLSAPVISIN TIAGDDVINA 
SATTDASGNW SVTVPASAVS ALGEATYSVT 
VATDD I INAS EAGSAQTISG QVTGAAAGST 
ALGNGELTVN ASVTNAVGNT GSGTRDITID 
SSSGFAAGTA LTWINNQTY AATVLANGSW 

40 TSITHPLTVD LTAVAISMNS ITSDDAINAA 
TTTVAANGSW STTVPAADLA ALRDGDASAQ 
IASDNIINAS EAAAGVTVSG TSTAQTGQTL 
LANNGYTLTA TVS DLAGNLG S AS KGVTVDT 
ATGAVAGDRL WTIAGQQYV TSTDASGNWS 

45 TQTHNVQVNT AAVSLSVSTI SGDNLINAAE 
ATIQSNGSWS VNVPAADVAA LSDGTSYTVS 
STISGDNLIN AAEAGSALTL SGTGTNFATG 
VAALSDGTSY TVSASAQDSA GNSATASRSV 
LNGSTSAEVG QTVTVTFGGK TYTATVAANG 

50 NPGQATHALT VDTVAPTVTI ATVAGDD UN 
WSATVGSGGS WSVFIPAQQF AGLSDGSYTI 
TFAGDDWNA AEHGSSLVIS GTTTAPVGQT 
ALADGNAYVI NASVSNAIGN TGSSNHTITV 
PVWNGSLTA ALASNETAQI SIDGGTTWTT 

55 AGNVGATDSQ NWIDTTAPD PAVKTIAISA 
AGEFAQISLD GGVTWTTLTV VGTSWSYADG 
VDTTSPEAAK SITITGISDD TGTSSSDFIT 
WVNVTVAADS LNWSYVDGRT LTNGTTTWQV 
ASISTDTGSS ATDFITSDTM LTLTGSLGAG 

60 DSRTLTDGSY VYQVRVLDLA GNTGPWSKT 
SQATDDTTPL LNGVLSAPLA SGEWYLYRN 
ARWDLAGNI TSSSDFVLTV DTSIPTTLAQ 
INGKTYTSEP GGAVWD PAH NTWYVQLPDT 
GTVTVNAAID YTPTWTTASK TTAWGLTYGL 

65 YQSGNNYATS SIADYDRNGT GDLFITRDDY 
GS I VAFDKEG DGYLDFWIGD AGGPDSNTFL 
SLNEGSGVDL NNDGRIDLVQ HTYNLNNYYT 



TVNITVSGTN SAGTTSTITH PVTVDLAGVA 840 
EAGQTVTVTF GGKNYTTTVE ANGSWTVNV P 900 
AYSVDATAPL VTINTIASDD I LNVS EAGAG 960 

DGTWSVNVPA ADLSGLTASS YTVTATVSDK 1020 

I NA IEHGQAL WS GTS TGAA AGDWTVTLN 10 80 

QTITASLSDR AGNSDSTTHD VTVDLSGPTL 1140 

VAGNTVIVTI GGNQYNATVQ SDLSWSVSVP 12 00 

DIVIDANLPG LRVDTVAGDD VINSIEHTQA 12 60 

ADGSWSVGVP VADVTNWPAG TVNIAVSGTN 132 0 

VINAAEKGSD LQLSGTTSGV EAGQTITVIF 13 80 

AANVQASVSN VAGNS TQATH AYS VDATAP S 1440 

AGQTVTVTLN GVNYSGNVQA DGSWSVSVPT 1500 

LTVDIiAAPW T INTVAGDD I INATEHGQAQ 15 60 

NQimSXQVPA SVISALAQGD VTITATVTDS 1620 

I NAAE KGADL AITGTSNQPA GTQITVTLNG 1680 

TVTAAATDAD GNSGSASHNV QVNTALPGVT 1740 

GDTVTVTLGG ATYTATVQAN LSWSVDVPAS 1800 

TIDANLPGLR VDTVAGDDW NI IEHGQAL V 18 60 

GTWSVGVPAV DVSAWPAGSV TIAASGSTSA 1920 

NAAEKGAALT LSGSTSGVEA GQTVTVTFGG 1980 

SAQASVSNVN GNSATTTHAY SVDASAPTVT 2040 

QTVTVTLNGT NYTGTVQTDG SWSVSVPSAD 2100 

VDTSVPWTI NTVAGDDVIN ATEHAQAQII 2160 

NWSVGVPASV VSALANGTVT INASVTDAGG 2220 

ADEKGQPLTI S GGS TGIiATG AQVTVTLNGH 22 80 

VSASATSAAG NTASSQANLL VDSGLPDVTI 2340 

DTVTVTLGGN TYTATVQSNL SWSVSVPTAD 24 00 

IDANLPGLRV DTVAGDDIVN S IEHGQAL VI 2460 

SWSVGIPAAN VSAWPAGPLT VEVDGQS SAN 2520 

AAEKGTNLTL SGSTSGIESG QTVTVTFGGK 2580 

VQASVSSASG NSASATHAYS VDASAPTLTI 2640 

TVTVTLNGAT YTGTVQADGS WSVSVPTSAL 2700 

DTTAPVLTIN TVAGDDIIND AEHAQALVIS 2760 

WSVGVPAADV TALGS GAQT I TASVSDRAGN 2820 

TEKGS DLALS GTSDQ PAGTA ITVTLNGQNY 2880 

ASVTNAQGNS STASHNVQVN TALPGITINP 2940 

VTVELGGKTY TATVQADLSW NVSVPAADWQ 3 000 
ASLPGLRVDT VAGDDWN 1 1 EHAQAQVITG * 3 060 

SVGVPATDVS NWPAGTLNIT VSGANSAGTQ 3120 

EKGAALTLSG STSGVEAGQT VTVTFGGKTY 3180 

VRVTNVNGNS ATATHEYSVD SAAPTVTINT 3240' 

TVTLNGTNYQ TTVQTDGSWS LTLPASDLTA 33 00 

TAPVISFNTV AGDDVINNVE HIQAQIISGT 33 60 

VGVPASVISG LAD GTVT I S A TITDSAGNSS 3420 

AGS ALTL S GT GTNFATGTW TVLLNGKGYS 3480 

ASAQDSAGNG NSSTQTHNVQ VNTAAVSLSV 354 0 

TWTVLLNGK GYSATIQSNG SWSVNVPAAD 3600 

AVDLTAPVIS INTVS TDDRL NAAEQQQPLT 3660 

TWALNVPAVD LAALGQGAQT I TAS VNDRAG 3720 

NAEQLAGQTI SGTTTAEVGQ TVTVTFNGQT 3780 

SATVSDQAGN PGSASRGVTL NGDVPTVTIN 3840 

LTLTLNGKTY TTTVQTGGSW SYTLGSADVT 3900 

DLSAPAMGIN IDSLQADTGL SASDFITSVS 3960 

LTVTGTTWRY NDSRTLTDGN YLYQVRVIDA 402 0 

I TTDMGL I TN DFVTSDTTLA VSGTLGATLS 4080 

HTLTDGTWNY TVRWDLAGN VGQTATQNW 4140 

SDTTLTVRGV LGAALGANEF AQISTDNGAT 420 0 

RWDLAGNVG ATSSQSALID TVNPAQVLTI 4260 

L AS GEVAQ I S LDSGATWTTL TTNGTQWTYT 4320 

VWDTINPTA TPTIVSYTDD VGQRQGTLSS 43 80 

GLLLGAVTMV GALNWTYSDS GLVSGAYTYS 4440 

ITSQTTRDTT PIISGVITAA LASGQYVEW 4500 

DAL TVS ATA Y TVTAQVKSSA GNGNNANISN 4560 

DSHGMWTVLA NQQVMQSTDP LTWSKTALTL 4620 

GTGYINGFTN NGDGTFSSAI QVTVGTLTWY 4680 

WNNAGTLVGN STTSNSGGSA TVGGAVTGYL 4740 

LSSLINQGNG TFVWGQNTTN TFLSGAGSGA 48 00 
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MSSSVSMTWA DFDGDGDMDL FLPASQGRAN YGSLLFNTNG VLGCPVAVGA TATTYASQFS 48 60 

LAVDWNHDGL MDIARIAQTG QSYLYTNVSN ASNWTQSALG GSQSGTTSGV AAMDYDWDGA 492 0 

VDVLVSKQSG S VFIi S RNTNT VSYGTSLHLR ITDPNGINVY YGNTVKLYNS AGVLVATQII 498 0 

NPQSGMGVND TSALVNFYGL NAGETYNAVL IKSTGTTASN IDQTVNTSWG GLQATDATHA 5040 

5 YDLSAEAGTA SNNGKFVGTG YNDTFFATAG TDTYDGSGGW VYSSGTGTWL ANGGMDWDF 5100 

RL S TVGVTAN LSSTAAQATG FNTSTFTNIE GISGSNFNDI LTGSSGDNQL EGRGGNDTLN 5160 

I GNGGHDTLL YKLLNASDAT GGNGSDWNG FTVGTWEGTA DTDRIDIREL LQGSGYTGNG 522 0 

KAS YVNGVAT LDAQAGMIGD FVKVTQSGSD TIVQIDRDGT GGTFATTNW TLTGVHTDLA 528 0 

TLLANHQLMV V 5291 
10 <212> Type : PRT 

<211> Length : 5291 

SequenceName : SEQ ID 281 

SequericeDe script ion ; 

15 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> Pre Sequences t ring : 

MGVHTAEATL PNGNNDTKIV NIAPDASNAQ VTLNIPAQQV VTNNSDSVQL TATVKDPSNH 60 

20 P VAGI TVNFT MPQDVAANFT LENNGIAITQ ANGEAHVTLK GKKAGTHTVT ATLSNNNTSD 12 0 

SQPVTFVADK TSALWLQIS KNEITGNGVD SATLTATVKD QFDNEVNNLP VTFSTASSGL 18 0 

TLTPGESNTN ESGIAQATLA GVAFGEQTVT ASLANNGASD NKTVHFIGDT AAAKI IELTP 240 

VPDSIIAGTP QNSSGSVITA TWDNNGFPV KGVTVNFTSN AATAEMTNGG QAVTNEQGKA 3 00 

TVTYTNTRSS I ES GARPDTV EASLENGSST LSTSINVNAD AS TAHLTLLQ ALFDTVSAGD 360 

25 TTNLYIEVKD NYGNGVPQQE VTLSVSPSEG VTPSNNAIYT TNHDGNFYAS FTATKAGVYQ 42 0 

VTATLENGDS MQQTVTYVPN VANAEISLAA SKDPVIANNN DLTTLTATVA DTEGNAIANS 480 

EVTFTLPEDV RANFTLGDGG KWTDTEGKA KVTLKGTKAG AHTVTASMAG GKSEQLWNF 540 

IADTLTAQVN LNVTEDNFIA NNVGMTRLQA TVTDGNGNPL ANEAVTFTLP ADVSASFTLG 600 

QGGSAITDIN GKAEVTIiSGT KSGTYPVTVS VNNYGVSDTK QVTLIADAGT AKLASLTSVY 660 

30 SFWSTTEGA TMTASVTDAN GNPVEGIKVN FRGTSVTLSS TSVETDDRGF AEILVTSTEV 720 

GLKTVSASLA DKPTEVISRL LNAKADINSA TITSLEIPEG QVMVAQDVAV KAHVNDQFGN 780 

PILNESVTFS AEPPEHMTIS QNIVSTDTHG IAEVTMTPER NGSYMVKASL ANGSSYEKDL 840 

WIDQKLTLS ASSPLIGVNS PTGATLTATL TSANGTPVEG QVINFSVTPE GATIiSGGKVR 9 00 

TNSSGQAPW LTSNKVGTYT VTAS FHNGVT IQTQTIVKVT GNSSTAHVAS FIADPSTIAA 960 

35 TNSDLSTLKA TVEDGSGNLI EGLTVYFALK SGSATLTSLT AVTDQNGIAT TSVRGAITGS 1020 

VTVSAVTTAG GMQTVD I TL»V AGPADASQ S V LKNNRSSLKG DFTDSAELHL VLHDISGNPI 10 8 0 

KVSEGLEFVQ SGTNAPYVQV SAIDYSKNFS GEYKATVTGG GEGIATLIPV LNGVHQAGLS 1140 

TTIQFTRAED KIMS GTVLVN GANLPTTTFP SQGFTGAYYQ LNNDNFAPGK TAADYEFSSS 12 00 

ASWVDVDATG KVTFKNVGSK WERITATPKT GGPSYIYEIR VKSWWVNAGD AFMIYSLAEN 1260 

40 FCSSNGYTLP LGDHLNHSRS RGIGSLYSEW GDMGHYTTEA GFHSNMYWSS SPANSNEQYV 1320 

VSLATGDQSV FEKLGFAYAT CYKNL 13 45 
<212> Type : PRT 
<211> Length : 1345 

SequenceName : SEQ ID 282 

45 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

50 <40 0> PreSequenceString : 

MSLIIDVISR KTSVKQTLIN PGDVTWIYE PSWQVHAQA SAVARYVREG NDLLIYMQDG 60 

TVIRCNGYFL QAANTAEQSE LVFADGQQLT HIT FADTAAG GLAPVELTAQ TTAIESIAPF 120 

LDTVAQTSAF P W G W LAG AAV GGGALGALLA SGGDGDSKTE VINNPTPPAE PGNATPSFLV 18 0 

TDNQGDQRGI LATNDITDDT TPTFSGSGQA GATIQIKDSN GNTIASTQVD NNGHWSVSLP 240 

55 TQSAGEHTWS WQIVGSTIT DAGSITLTID NSQASVQVAT TAGDNI INAS EQAAGFTLSG 3 00 

TSSHLAQGTE LTVTLNGKTY TTSVGANGAW SVQVPTADAQ ALGEGNQAVL VSGKDATGNT 3 60 

VTGAQLLTVD TQPPTLAINT IAQDNIISAA EHNVALVLSG TSNAEAGQTV TLTVNGKSHT 420 

ATVGSDGTWQ VTLPATEVQA LAEGNYAWA SVSDRAGNTT SHSANFTVDT SAPWSVNTV 480 

AGDD I LNNAE QAVAQIISGQ VSGASPGDTV TVKLGTHVLT GIVLADGSWN VALDPAVTRT 540 

60 LDRGANT I FV TVTDAAGNTG AASRAITLVG VSPLITINTV SGDDIISGAE KGAPLTLTGS 600 

■ TQQAETGQTV TVTLAGQSFT TTVQADGSWS LTVPAAAMGN LPDGAVAITA SVTDLSGNTG 660 

NTSRTITVDS QAPALSIDPL TADNI INAAE SGQDLPITGT TDAQPGQTVT VTL2SJGQTYQG 72 0 

WQPDGTWSV TVPAANVGAL ADGNATVTAS VNDVAGNPSS VSRVALVDAT PPWTINPVA 780 

TDNVINTPEH AQAQIISGTV TGAQAGD I VT VTLNNVDYTT WDGSGNWSL GVPASWSGL 840 

65 ADGSYPVSVS VTDKAGNTGS QSLTVTVNTA APLIGINSIA GDDVINASEK GADLQITGTS 900 

DQPVNTAITV TLNGQNYTTT TDASGNWSVT VPASAVTALG QANYTVTAAV TSDIGNSATA 960 

SHNVLVDSAL PGVTINPVAT DDI INAAEAG VAQTISGQVT GAEDGDTVTI TLGGNTYTAT 1020 
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VGSNLTWSVD VPAADIQALG NGDLTVNASV TNQNGNTGSG TRDITIDANL PGLRVDTVAG 108 0 

DDWNI IEHG QALWTG S S S GLAESTPLTV TINNVEYTTA VQADGSWSVG VTAAQVSAWP 1140 

AGTVNIAVSG ESSAGNSVSI THPVTVDLTP AAITINTIAT DDVINAAE KG ADLTLSGTTT 12 00 

NVEPGQTVTV TFGGKNYTAS VASDGSWTAT VPAADLASLP EGSASALASV SNINGNSASA 1260 

5 VHNYSVDSSA PTIIINTVAS DNIVNASEAD AGVTVSGSTT AEAGQIVTIT LNSPTVQTYQ 13 20 

ATVQADGSWS INI PAADLEA LTDGSHTLTA TVNDKAGNPA STTHNLAVDL TVPVLTINTI 13 80 

AGDDIINATE HGQALVISGS STGGEAGDW TVTLNS KTYT TTLDASGNWS VGVPAADVTA 1440 

LGSGPQTVTA TVTDAAGNSD N 1461 
<212> Type : PRT 
10 <211> Length : 1461 

SequenceName : SEQ ID 283 

SequenceDescription : 

Sequence 
15 

<213> OrganismName : Escherichia coli 0157:H7 
<400> Pre Sequences t ring : 

MNRIYRVIWN CTLQVFQACS ELTRRVGKTS TVNLRKSSGL TTKFSRLTLG VLLALSGSVS 60 

GASLEVDNGQ ITNIDTDVAY DAYLVGWYGT GVLNILAGGN ASLTTITTSV IGGNEDSEGT 12 0 

20 VNVLGGTWRL YDSGNNARPL NVGQSGTGTL NIKQKGHVDG GYLRLGTQAA GVGTVNVEGE 18 0 

DSVLTTELFE IGSYGTGSLN ITDKGYVTSS IVAILGYQAN SNGKWVE KG GEWLI KNNDS 240 

SIEFQIGNQG TGEATIREGG LITAENTIIG GNATGVGTLN VQDQDSVITV RRLYNGYFGN 300 

GAVNISNNGL INNKEYSLVG VQDGSHGWN VTDKGHWNFL GTGEAFRYIY IGDAGDGELN 3 60 

VSREGKVDSG I ITAGMKETG TGNLTVKDKN SVITNLGTNL GYDGHGEMNI SNEGLWSNG 42 0 

25 GSSLGYGETG VGKVSITTGG IWEVNKNVYT TIGVAGVGNL NISDGGKFVS QNITFLGDKA 48 0 

SGIGTLNLMD ATSSFDTVGI NVGNFGSGIV NVSNGATLNS TGYGFIGGNA SGKGIVNIST 54 0 

DSLWNLKTSS TNAQLLQVGV LGTGELNITT GGI VKARDTQ IALNDKSKGD VRVDGQNSLL 600 

ETFNMYVGTS GTGTLTLTNS GTLNVEGGEV YLGVFE PAVG TLNIGAAHGE AAADAGF I TN 660 

ATKVEFGSGE GVFVFNHTNN SDAGYQVDML I TGDDKDGKV IHDAGHTVFN AGNTYSGKTL 72 0 

30 VNDGLLTTAS HTADGVTGMG SSEVTIASPG TLDILASTNS AGDYTLTNAL KGDGLMRVQL 780 

SSSDKMFGFT HATGTEFAGV AQLKDSTFTL ERDNTAALTH AMLQSDIENT TSVUVGEQSI 840 

GGLAMNGGTD IFDTDIPAAT LAEGYISVDT LWGASDYTW KGRNYQVNGT GDVLIGVPKP 900 

WNDPMANNPL TTLNLLEHDD NHVGVQLVKA QTVTGSGGSL TLRDLQGDEV EADKTLHIAQ 960 

NGTWAEGDY GFRLTTAPGD GLYVNYGLKA LNIHGGQKLT LAEHGGAYGA TADMSAKIGG 1020 

35 EGDLAINTVR QVSLSNGQND YQGATYVQMG TLRTDADGAL GNTRELNISN AAIVDLNGST 1080 

QTVETFTGQM GSTVLFKEGS LTVNKGGISQ GELTGGGNLN VTGGTLAVEG LNARYNALTS 1140 

VSPNAEVSLD NTQGLGRGNI ANDGLLTLKN VTGELRNSIS GKGIVSATAR TDVELDGDNS 1200 

RFVGQFNIDT GSALSVNEQK NLGDASVINN GLLTISTERS WAMTHSISGS GDLTKLGTGI 1260 

LTLNNDS SAY QGTTDIVGGE IAFGSDSAIN TASQHINIHN SGVMSGNVTT AGDVNVMSGG 1320 

40 TLRVAKTTXG ESAATWRMAA RFK 1343 
<212> Type : PRT 
<211> Length : 1343 

SequenceName : SEQ ID 284 
SequenceDescription : 

45 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

50 MG I KQHNGNT KADRLAELKI RSPSIQLIKF GAIGLNAlLF SPLLIAADTG SQYGTNITIN 60 
DGDR I TGDTA DPS GNL YGVM TPAGNTPGNI NLGNDVTVNV NDASGYAKGI IIQGKNSSLT 120 
ANRLTVDWG QTSAIGINLI GDYTHADLGT GSTIKSNDDG IIIGHSSTLT ATQFTIENSN 180 
GIGLTINDYG TSVDLGSGSK I KTDGS TGVY IGGLNGNNAN GAARFTATDL TIDVQGYSAM 240 
GINVQKNSW DLGTNSSIKT SGDNAHGLWS FGQVSANALT VDVTGAAANG VEVRGGTTTI 3 00 

55 GADSHISSAQ GGGLVTSGSD ATINFSGTAA QRNSIFSGGS YGASAQTATA VINMQNTDIT 3 60 

VDRNGSLALG LWALSGGRIT GDSLAITGAA GARGI YAMTN SQIDLTSDLV IDMSTPDQMA 420 
IATQHDDGYA ASRINAS GRM LINGSVLSKG GLINLDMHPG SVWTGSSLSD NVNGGKLDVA 480 
MNNSVWNVTS NSNLDTLALS HSTVDFASHG STAGTFTTLN VENLSGNSTF IMRADWGEG 540 
NGVKPWA 547 

60 <212> Type : PRT 

<211> Length : 547 

SequenceName : SEQ ID 2 85 
SequenceDescription : 

65 Sequence 



<213> OrganismName : Escherichia coli Q157:H7 
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<4 00> PreSequenceString : 

MGIDSRNDIP EGIATLGAFM GYSHSHIGFD RGGHGSVDSY SLGGYASWEH ESGFYLDGW 60 

KLNRFESNVA GKMSSGGAAN GSYHSNGLGG HIETGMRFTD GNWNLTPYAS LTGFTADNPE 12 0 

YHLSNGMESK SVDTRSIYRE LGATLSYNMR LGNGMEVEPW LKAAVRKEFV DDNRVKVNSD 180 

5 GNFVNDLSGR RGIYQAGIKA SFSSTLSGHL GVGYSNGAGM ESPWNAVAGV NWSF 2 34 



<212> Type : PRT 
<211> Length : 234 

SequenceName : SEQ ID 286 
10 SequenceDe script ion : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 . . 

15 <400> PreSequenceString : 

MKKKVLAIAL VTVFTGMGVA QAADVTAQAV AT W S ATAKKD TTSKLWTPL GSLAFQYAEG 60 
IKGFNSQKGL FDVAIEGDST ATAFKLTSRL ITNTLTQLDT SGSTLNVGVD YNGAAVEKTG 120 
DTVMIDTANG VLGGNLSPLA NGYNASNRTT AQDGFTFSII SGTTNGTTAV TDYSTLPEGI 180 
WSGDVSVQFD ATWTS 195 
20 <212> Type : PRT 

<211> Length : 195 

SequenceName : SEQ ID 287 

SequenceDescription : 

25 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MTAESYDDNY LDDEDADWTA TGQGQKSAGD TSFTLAWKPG EEGQKGLIGW FESGDVRAYK 60 
30 IRFPNGTVDV FRGWVSSIGK AVTAKEVITR TVKVTNVGKP SVAEERSKIT PVSAIKVTPT 120 

SGTVAKGKTT TLTVS FEPES ATDKTFRAVS ADPSKATISV KDMTITVHGV ATGKVQIPW 180 

SGNGQFAAVA EVTVTEAGAA G 2 01 

<212> Type t PRT 

<211> Length ; 201 
35 SequenceName : SEQ ID 288 

SequenceDescription : 

Sequence 



40 <213> OrganismName Escherichia coli 0157:H7 
<400> PreSequenceString : 

MTAESYDDNY LDDEDADWTA TGQGQKSAGD TSFTLAWKPG EEGQKGLIGW FESGDVRAYK 60 

IRFPNGTVDV FRGWVSSIGK AVTAKEVITR TVKVTNVGKP SVAEERSKIT PVSAIKVTPT 120 

SGTVAKGKTT TLTVS FEPES ATDKTFRAVS ADPSKATISV KDMTITVNGV ATGKVQIPW 180 

45 SGNGQFAAVA EVTVTEAGAA G 2 01 



<212> Type : PRT 

<211> Length : 201 

SequenceName r : SEQ ID 289 
SequenceDescription : 

50 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

55 MLYNIPCRIY ILSTLSLCIS GIVSTATATS SETKISNEET LWTTNRSAS NLWESPATIQ 60 

VIDQQTLQNS TNAS I ADNLQ DIPGVEITDN SLAGRKQIRI RGEASSRVLI LIDGQEVTYQ 120 

RAGDNYGVGL LIDESALERV EWKGPYSVL YGSQAIGGIV NF I TKKGGDK LAS GWKAV Y 180 

NSATAGWEES IAVQGSIGGF DYRINGSYSD QGNRDTPDGR LPNTNYRNNS QGVWLGYNSG 240 

NHRFGLSLDR YRLATQTYYE DPDGSYEAFS VKIPKLEREK VGVFYDTDVD GDYLKKIHFD 3 00 

60 AYEQTIQRQF ANEVKTTQPV PSPMIQALTV HNKTDTHDKQ YTQAVTLQSH FSLPANNELV 3 60 

TGAQYKQDRV SQRSGGMTSS KSLTGFINKE TRTRSYYESE Q S TVS L FAQN DWQFADHWTW 42 0 

TMGVRQYWLS SKLTRGDGVS YTAGIISDTS L ARE S ASDHE MVTSTSLRYS GFDNLELRAA 4 80 

FAQGYVFPTL SQLFMQTSAG GSVTYGNPDL KAEHS NNFEL GARYNGNQWL IDSAVYYSEA 540 

KDYIASLICD GS I VCNGNTN SSRSSYYYYD NIDRAKTWGL E X S AEYNGWV FSPYISGNLI 600 

65 RRQYETSTLK TTNTGEPAIN GRIGLKHTLV MGQANIISDV FIRAASSAKD DSNGTETNVP 660 

GWATLNFAVN TEFGNEDQYR INLALNNLTD KRYRTAHET I PAAGFNAAIG FVWNF 715 
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<212> Type : PRT 

<211> Length : 715 

SequenceName : SEQ ID 290 
SequenceDe script ion : 

5 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

10 MTKMSRYALI TALAMFLAGC VGQREPAPVE EVKPAPEQPA EPQQPVPTVP SVPTIPQQPG 60 
PIEHEDQTAP PAPHIRHYDW NGAMQPMVSK MLGADGVTAG SVLLVDSVNN RTNGS LNAAE 120 
ATETLRNALA NNGKFTLVSA QQLSMAKQQL GLSPQDSLGT RSKAIG-IARN VGAHYVLYSS 180 
ASGNVNAPTL QMQLMLVQTG EIIWSGKGAV SQQ 213 
<212> Type : PRT 

15 <211> Length : 213 

SequenceName : SEQ ID 291 
SequenceDescription : 



Sequence 

20 

<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MKSKVLALLI PALLGAGAAH AAEVYNKDGN KLDLYGKVDG LHYFSDNSAK DGDQSYARLG 60 
FKGETQINDQ LTGYGQWEYN IQANNTESSK NQSWTRLAFA GLKFSDYGSF DYGRNYGLDR 120 
25 YAA 123 
<212> Type : PRT 
<211> Length : 123 

SequenceName : SEQ ID 292 

SequenceDescription : 

30 

Sequence 



<213> OrganismName t Escherichia coli 0157 :H7 
<400> PreSequenceString : 

35 MATPNPLEPV KGAGTTLWVY NGKGDAYANP LSDDDWQRLA KVKDLXPGEM TAEPYDDNYL 60 
DDEDADWTAT GQGQKSAGDT SFTLAWKPGE EGQKGLI GWF ESGDVRAYKI RFPNGTVDVF 120 
RGWVSSIGKA VTAKEVITRT VKVTNVGKPS VAEERSEITP ATAI KVTP TS GTVAKGKTTT 180 
LTVSFEPESA TDKTFRAVSA DPSKATISVK DMT I TVNGVA TGKVQXPWS GNGQFAAVAE 240 
VTVTEAGAAG 2 50 

40 <212> Type r PRT 

<211> Length : 250 

SequenceName : SEQ ID 293 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

MATPNPLEPV KGAGTTLWVY NGKGDAYANP LSDDDWQRLA KVKDLXPGEM TAEPYDDNYL 60 
50 DDEDADWTAT GQGQKSAGDT SFTLAWKPGE EGQKGLIGWF ESGDVRAYKI RFPNGTVDVF 120 
RGWVSSIGKA VTAKEVITRT VKVTNVGKPS VAEERSEITP ATAI KVTP TS GTVAKGKTTT 18 0 

LTVSFEPESA TDKTFRAVSA DPSKATISVK DMT I TVNGVA TGKVQXPWS GNGQFAAVAE 240 
VTVTEAGAAG 2 50 

<212> Type : PRT 
55 <211> Length : 250 

SequenceName : SEQ ID 294 
SequenceDescription : 



60 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MGWTDMLPEF GGDSYTNADN FMTGRANGVA TYRNTDFFGL VNGLNFAVQY QGNNE GAS NG 60 

QEGTNNGRDV RHENGD GWGL STTYDLGMGF SAGAAYTSSD RTNDQVNHTA AGGDKADAWT 120 

65 AGLKYDANNI YLATMYSETR NMTPFGDSDY AVANKTQNFE VTAQYQFDFG LRPAVSFLMS 180 

KGRDLHAAGG ADNPAGVDDK DLVKYADVGA TYYFNKNMST YVDYKINLLD EDDSFYAANG 240 

ISTDDIVALG LVYQF 255 
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<212> Type : PRT 

<211> Length : 255 

SequenceName : SEQ ID 295 
SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MGF I MKLTKT alctalfatf tfsanaqtyp dlpvgikggt GALIGDTVYV GLGSGGDKFY 60 
TLDLKDPSAQ WKEIATFPGG ERNQPVAAAV DGKLYVFGGL QKNEKGELQL VNDAYRYNPS 12 0 

DNTWMKLPTR SPRGLVGSSG ASHGDKVYIL GGSNLSIFNG FFQDTVAAGE DKAKKDEIAA 180 
AYFDQRPEDY FFTTELLSYE PSTNKWRNEG RIPFSGRAGA AFTIQGNELV WNGEIKPGL 24 0 

RTAETKQGKF TAKGVQWKNL PDIiPAPKGKS QDGI^GALSG YSNGHYLVTG GANFPGSIKQ 3 00 

FKEGKLHAHK GLSKAWHNEV YTLNNGKWRI VGELPMNIGY GFSVSYNNKV LLIGGETDGG 360 
KALTSVKAIS YDGKKLTIE 379 
<212> Type : PRT 
<211> Length : 379 

SequenceName : SEQ ID 296 

SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus 
<4 00> PreSequenceString ; 
MGEQYMLTTI LSFLIVTTW AYVSWLKTKG 
STEQLIGVNA VSYKGNFSVI AWTVPTVIPL 
<212> Type : PRT 
<211> Length : 101 

SequenceName : SEQ ID 297 
SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString z 

MKNQHKNPLT KALMKTYPYN HFLFFCFILG AFLLGLLSPA YALSIITTKE IDANLLNGAI 60 

ESRWLGKRV FKVEAHGFYF RNNATNS ID I EITSLIiRDNQ SFPLTSSAKT SLKIPPNAKI 120 

KKSTILVLKG ENAEEVAKIL GVSKEEYQKL ENIAQTKAAN DPMYANTPFS NGSDSSFYDN 18 0 

NPNSPSNNAI NGKDGANGSN GYGANGNDGV NGISGSNGAN GSHSNNNAIG SGIDTDGVLG 240 

VDGVNGSSSS SGGSVGGYEN NFTNHGSTNN NTGGYDNFNN GSSSGGSLGN GGLFPIPFGN 300 

GDTNNSNNST NTTSPTNGSS SNNATNPSSQ ENNYSSQYCK VPELSPNNTM KLDVIAKDGS 360 

CI SMNALRDD TKCAYRYDFE AGKAIKQTQY YYVDRENKTQ NIGGCVDLQG AQYAMQLYKD 420 

DSKCALQTTS DKGYGMGKTQ TFQTEIVFRG MDNL I HVAVP CSDYARVQDR IVRYEKNDKT 480 

QTLTPIVDQY YNDPNNPNKQ EILNRGIATQ LSSQYQEFAC GQWEYNDAKL EAKRPTMLKS 540 

YNKLNGEWVE VTP CNFE AG I KSGAWSPYV MGVPSSKVLS DITTSHYFRI ERKNYGEREQ 60 0 

CQKLYGVNRC QPQYSILILV SPIGAPLTKP LPPJOPLNLIY AQPKIMKNTP QPIILSPLKP 660 

PSTGLKAF 668 
<212> Type : PRT 
<211> Length : 668 

SequenceName : SEQ ID 298 

SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MPVIRVLVML ATMMMKLVKT AKEKKVFKNV GISIMGIAFW EAIKDSIKKQ IKKSDWICGN 60 
VKTADDYLKT HPNSWFNSAI GVTAI TAMLM NVCFADDQSK KEVAQAQKEA ENARD RANKS 120 
GIELEQEEQK TEQEKQKTEQ EKQKTEQEKQ KTEQEKQKTE QEKQKTSNIE TNNQI KVEQE 180 
QQKTEQEKQK TNNTQKDLVN KAEQNCQENH NQFFXKKLGI KAGIAIEIEA ECKTPKPTKT 240 
NQTPIQPKHL PNSKQPHSQR GSKAQELIAY LQKELESLPY SQKAIAKQVD FYRPSSIAYL 300 
ELDPRDFNAT EEWQKENLKI RSKAQAKMLE MRSLKPDPQA HLSTSQSLLL VQKI FADVSK 360 
EIKWANTEK KVEKAGYGYS KRM 383 
<212> Type : PRT 
<211> Length : 383 

SequenceName : SEQ ID 299 



influenzae Rd 

DDLKSSKGYF LAGRGLSGLV IGCSMVLTSL 60 
CFLALYIIGW L 101 



WO 2005/076010 



94/341 



PCT/IN2005/000037 



SequenceDescription : 
Sequence 

5 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MNYPNLPNSA LEISEQPEVK E I TNEL LKQIi QNALRSNAHF SEQVELSLKC IVRILEVLLS 60 
LDFFKNANEI DSSLRNSIEW LTNAGESLKL KMKEYERFFS EFNTSMHANE QEVTNTLNAN 120 
AENIKSEIKK LENQLIETTT RLLTSYQIFL NQARDNANNQ ITKNKTQSLE AITQAKNNAN 180 

10 NEISNNQTQA ITNITEAKTN ANNEISNNQT QAITNINEAK ESATTQINAN KQEAINNITQ 240 
EKTQATS E I T EAKKTDHYQN IDFFEFE 267 
<212> Type : PRT 
<211> Length : 267 

SequenceName z SEQ TD 3 00 

15 SequenceDescription : 

Sequence 



<213> OrganismName r Helicobacter pylori J99 

20 <400> Pre Sequence St ring : 

MKFFSKDLFK KVTPLFLSVY FLSPTLTQAK SRFYVASQYQ VGKMIMKKYN DLKRTIEGAS 60 
FSLGWEINPT NYWFYSRYYF FMDYGNVILN KRTGAQANMF TYGFGGDLIM EYNKNPLYVF 120 
SLFYGMQVAE NTWTISKHSA NFIIDDWRSI QGFSLKTSNF RMLGLVGFKF QTVLFHHDAS 180 
IEVGIKWPFA FEYDSPFVRL FSVFISHTFY L 211 

25 <212> Type : PRT 

<211> Length : 211 

SequenceName : SEQ ID 301 
SequenceDescription : 

30 Sequence 



<213> OrganismName ~ Helicobacter pylori J99 
<4 00> PreSequenceS tiring : 

MKKFTLSLFL CCTLLNAEED IFRNNTNETD LTNSFEHGKE NNNLIPAKSD SLESFKEQEN 60 
35 KEKAKQLMDL KALQSVYFSK NRKLQDNNFN VLYVAGNTNK IRLRYAMTTT FIFDNDPIIY 120 
VSLGDPSDFE LTYPTNDHYD LSNMLVIKPL LIGVDTNLTV VGASGTIYTIi LFV 173 

<212> Type : PRT 
<211> Length : 173 
40 SequenceName z SEQ ID 302 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceS t ring : 

MLDYVPWIGN GYRYGNNHRG SNSSTSGVTT QGQSQNASSN EPAPTFSNVG VGLKANVNGT 60 

LSGSRTTPNQ QGTPWLTLDQ ANLQLWTGAG WRNDKNGQSD ENYTNFASAK GSTNQQGSTT 120 

GGSAGNPDSL KQDKADKSGD SVTVAEATSG DNLTNYTNLP PTSPPHPTDR TRCHSPTRTT 18 0 

50 PSGCSCSCAA CWAASRCWSI RVGKMI TVS L IPPTKNGLTP N 221 



<212> Type : PRT 

<211> Length : 221 

SequenceName : SEQ ID 3 03 
SequenceDescription : 

55 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

60 MDDITAPQTS AGSSSGTSTN TSGSRSFLPT FSNVGVGLKA NVQGTLGGRQ TTTTGNNIPK 60 

WATLDQANLQ LWTGAGWRND KTTSGSTGNA NDTKFTSATG SGSGQGSSSG TNTSAGNPDG 12 0 

LQADKVDQNG QVKTSVQEAT SGDNLTNYTN LPPANLTPTA DWPNALSFTN KNNAQRAQLF 18 0 

LRGLLGSIPV LVNKSGQDDN SKFKAEDQKW SYTDLQSDQT KLNLPAYGEV NGLLNPALVE 24 0 

TYFGNTRASG SGSNTTSSPG IGFKIPEQSG TNTTSKAVLI TPGLAWTPQD VGNIWSGTS 3 00 

65 FSFQLGGWLV TFTDFIKPRA GYLGLQLTGL DVS EATQREL IWAKRPWAAF RGSWVNRLGR 3 60 
VESVWDFKGV WADQAQLAAQ AATSSTTTTA TGATLPEHPN ALAYQISYTD KDSYKASTQG , 420 

SGQTNSQNNS PYLHFIKPKK VESTTQLDQG LKNLLDPNQV RTKLRQSFGT DHSTQPQPQS 480 
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LKTTTPVFGR SSGNLSSVFS GGGAGGGSSG SGQSGVDLSP VERVSGH 527 
<212> Type : PRT 
<211> Length : 527 

SequenceName : SEQ ID 304 
5 SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
10 <400> PreSequenceString z 

MLKLAVGIFI SPTLTRFSTG FNXxAGSVLDQ 

AGSSSGTSTN TSGSRSFLPT FS2SFVGVGLKA 

LWTGAGWRND KASNKQSDEN HTTFKSATGS 

TTOOGAPQSN STTESASNYD HLE>PNLTPTS 
15 LVNRSGSDDS NKFQATDQKW SYTDLKSDQT 

SGSNTTSSPG IGFKIPEQNN DSKAVLITPG 

DFVKPRAGYL GLQLTGLDAS DATQRALIWA 

QAQAAAQAAT TAAATGDALP EHIPNALAYQI 

KPKKVENTTQ LDQGLKTCWT PTRFAPSCAK 
20 VCLWGVLEE QTAPIRWTSP PLNGWVGGLW 

FFLNCSLTLF IWTTASLATG LTWGHFTST 

PWTYRNTSFS SLPLTGENPG AWALVRDNTA 

LRRYDLAGRC TTSTFRS 

<212> Type : PRT 
25 <211> Length : 737 

SequenceName : SEQ ID 305 
SequenceDescription. : 

Sequence 

30 

<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString- r 

MLDYIPWIGN GHRYGNDHRG SlsTSSTSGVTT QGQQSQNASG TEPASTFSNV GVGLKANVQG 60 
TLGGSQTTTT GKDIPKWPTL DQANLQLWTG AGWRNDKASS GQSDENHTKF TSATGSGQQG 120 

35 SSSGTTNSAG NPDS LKQDKV DKSGDSVTVA ETTSGDNLTN YTNLPPNLTP TADWPNALSF 180 
TNKNNAQRAQ LFLRALLGSI PVLWKSGQD DSNKFQATDQ KWSYTELKSD QTKLNLPAYG 240 
EVNGLLNPAL VEVYGLSSTQ GSSTGAGGAG GNTGGDTNTQ TYARPGIGFK LPSTDSESSK 3 00 

ATLITPGLAW TAQDVGNLVV SGTSLSFQLG GWLVTFTDFI KPRSGYLGLQ LTGLDANDSD 3 60 

QRELIWAPPA LNRLSWQLGQ PLGPRGECVG FQGGVGGSSS VRLASSYKYH HRNEGYLIGA 420 

40 HQCFGLS GEL YRPGFVQGFH SKIiRPKPKHL PL PALGAGEK SRFLW 465 
<212> Type : PRT 
<211> Length : 465 

SequenceName : SEQ ID 306 
SequenceDescription : 

45 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

50 MLGSIPVLVN RSGSDSNKFQ ATDQKWSYTD LQSDQTKLNL SAYGEVNGLL NPALVETYFG 60 

TTRTSSTANQ NSTTVPGIGF KIPEQNNDSK ATLITPGLAW TPQDVGNLW SGTTVSFQLG 12 0 

GWLVTFTDFV KPRAGYLGLQ LSGLNASDSD QRELIWAPRP WAAFRGSWVN RLGRVESVWD 180 

LKGWADQAQ LAAQAATSST TT TATGATL P EHPNALAYQ I SYTDKDSYKA STQGSGQTNS 240 

QNNSLYLHLI KPKKVESTTQ LDQGLKNLLD PNQVRTKLRQ SFGTDHSTQP QPQSLKTTTP 3 00 

55 VFGAMSGNLG SVLSGGGAGG AGSTNSVDLS PVERVSGSLT INRNFSY 347 



<212> Type : PRT 

<211> Length : 347 

SequenceName : SEQ ID 307 
SequenceDescription : 

60 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 
65 MGQQGQSGTS AGNPDSLKQD KISKSGDSLT TQDGNATGQQ EATNYTNLPP NLTPTADWPN 60 
ALSFTNKNNA HRAQLFLRGL LG-SIPVLVNR SGSDSNKFQA TDQKWSYTDL QSDQTKLNLP 120 
AYGEVNGLLN PALVETYFGN TRAGGSGSNT TSSPGIGFKI PEQNNDSKAT LITPGLAWTP 180 



VLDYVPWIGN GHRYGNNHRG 
NVQGTLGGSQ TTTTGKDIPK 
GQQGGSTTGG SAGNPDSLKQ 
DWPNALSFTN KNNAQRAQLF 
KLNLPAYGEV NGLLNPALVE 
LAWTPQDVGN LWSGTSLSF 
KRPWAAFRGS WVNRLGRVES 
SSTDKDSYKA STQSSGQTNS 
ALVQTIPPKP NPNPSKQPHR 
GNYPVGVGGI WRILKVCKT 
TTTLKRQQFS YTRPDEVALR 
KGITAGSGSQ QTTYDPTRTE 



VDDITAPKTG . 60 

WPTLDPANLQ 120 

DKISKSGQNL 180 
LRGLLGS I PV , . 240 

TYFGTTRAGG 3 00 

QLGGWLVTFT 3 60 

VWDLKGVWQD 42 0 

QNTSPYLHLI 480 

CLGRIWTLA 540 

LLFISIFISI 600 

HTNAI NPRLT 660 

AALTTATTFV 720 
737 
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QDVGNLWSG TSLSFQLGGW LVSFTDFIKP RAGYLGLQLS GLDASDSDQR ELIWAKRPWA 240 

AFRGSWVNRL GRVESWDLK GVWADQAQLA AQAATSEASG SALAPHPNAL AFQVSWEAS 3 00 

AYSSSTSSSG SGSSSNTSPY LHLIKPKKVE STTQLDQGLK NLLDPNQVRT KLRQS FGTDH 3 60 

STQPQSLKTT TPVFGTSSGN IGSVLSGGGA GGGSSGSGQS GVDLSPVERV SGH 413 

<212> Type : PRT 
<211> Length : 413 

SequenceName : SEQ ID 308 

SequenceDescription : 

Sequence 



<213> OrganisraName : Mycoplasma pneumoniae 
<s40 0> PreSequenceString 
15 MGLQLSGLDA SDSDQRELIW AKRPWAAFRG SWVNRLGRVE SVWDLKGVWA DQAHSAVSES 60 
QAATSSTTTT ATGDTLPEHP NALAYQISST DKDSYKASTQ GSGQTNSQNT SPYLHLIKPK 120 
KVTASDKLDD DLKNLLDPNE VRVKLRQSFG TDHSTQPQPQ PLKTTTPVFG TNSGNLGSVL 180 
SGGGTTQDSS TTNQLSPVQR VSGWLVGQLP STSDGNTSST NNLAPNTNTG NEWGVGDLS 240 
KRASIESSRL WIALKP 256 
20 <212> Type : PRT 

<211> Length : 256 

SequenceName : SEQ ID 309 

SequenceDescription : 

25 Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MRDNTAKGIT AGSGSQQTTY DPARTEATLT TTTFALRRYD LAGRALYDLD FSKLNPQTPT 60 

30 RDANCQITFN PFGGFGLSGS APQQWNEVKN KYPVEVAQDP TDPYRFAVLL VPRSWYYEQ 120 

LQRGLALPNQ GSSSGSGQQN TTIGAYGLKV KNAEADTAKS NEKLQGDESK SSNGSSSTST 180 

TTQRGSTNSD TKVKALKIEV KKKSDSEDNG QLQLEKNDLA NAP IKRGEES GQSVQLKADD 240 

FGTAPSSSGS GGNSNPGSPT PWRPWLATEQ IHKDLPKOTSA SILILYDAPY ARNRTAIDRV 3 00 

DHUDPKVMTA NYPPSWRMPK WNHHGLWDWK ARDVLFQTTG FDESNTSNTK QGFQKEADSD 3 60 

35 KSAPIALPFE AYFANIGNLT WFGQALLVFG GNGHVTKSAH TAPLSIWLYI YLVKAVTFRL 420 
LLANSLLSKS NIYKKTAN " 438 
<212> Type : PRT 
<21X> Length r 438 

SequenceName r SEQ ID 310 
40 SequenceDescription r 



Sequence 



<213> OrganismName : Mycoplasma pneumoniae 

45 <40 0> PreSequenceString : 

MRDNIAKGIT AGSNTQQTTY DPTRTEATLT TATTFALRRY DLAGRALYDL DFSKLNPQTP 60 
TRDQTGQITF NPFGGFGLSG AAPQQWNEVK DKVPVEVAQD PSNPYRFAVL LVPRSWYYE 12 0 

QLQRGLALPN QGSSSGSGQQ NTTIGAYGLK VKNAEADTAK SNEKLQGYES KSSNGSSSTS 180 
TTQRGGSSNE NKVKALQVAV KKKSGSQGNS GDQGTEQVEL E SNDLANAP I KRGSNNNQQV 240 

50 QLKADDFGTA PSSSGSGTQD GTPTPWTPWL TTEQIHNDPA KFAASILILY DAPYARNRTA 3 00 

IDRVDHLDPK VMTANYPPSW RTP KWNHHGL WDWKARDVLL QTTGF FNPRR HPEWFDGGQT 360 
VADNE KTGFD VDNSENTKQG FQKEADSDKS APIALPFEAY FANIGNLTWF EQALLVFGIC 420 
LS 422 
<212> Type : PRT 

55 <211> Length : 422 

SequenceName : SEQ ID 311 
SequenceDescription : 



Sequence 

60 

<213> OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MLWPFRWVWW KRVLTSQTRA PAKPNPLTVP PTCTWWSLRK LPNPTKLDDD LKNLLDPNEV 60 
RARMLKSFGT ENFTQPQPQP QALKTTTPVF GTSSGNLGSV LSGGGYHAGL KHHQSTVTRS 120 
65 TGEWVDR 127 
<212> Type : PRT 
<211> Length : 127 
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SeguenceNarae : SEQ ID 312 
SequenceDescription : 



Sequence 

<213> OrganismName : Mycoplasma pneumoniae 
<4O0> PreSequenceString : 

MRDNSAKGIT AG5ESQQTTY DPTRTEAALT ASTTFALRRY DLAGRALYDL DFSRLNPQTP 60 
TRDQTGQITF NPFGGFGLSG AAPQQWNEVK NKVPVEVAQD PSNPYRFAVL LVPRSWYYE 12 0 

10 QLQRGLALPN QGSSSGSGQQ NTTIGAYGLK VKNAEADTAK SNEKLQGDES KSSNGSSSTS 180 
TTTQRGGSSG DTKVKALQVA VKKKSGSQGN SGEQGTEQVE LESNDLANAP IKRGEESGQS 240 
VQLKAADFGT TPSSSGSGGN SNPGSPTPWR PWLATEQIHK DLPKWSASIL ILYDAPYARN 3 00 

RTAIDRVDHL DPKVMTANYP PSWRTPKWNH HGLWDWKARD VLLQTTGFFN SRRHPEWFDQ 3 60 

GQAVADNTQT GFDTDDTDNK KTRLSKGSWL RQAGPDEPPV WSVLRQHWQP HJvVRASAFGV 42.0 

15 WDLFVLIN 428 
<2X2> Type : PRT 
<211> Length : 428 

SequenceName : SEQ ID 313 
SequenceDescription : 

20 

Secjuence 



<2X3> OrganismName : Mycoplasma pneumoniae 

<40 0> PreSequenceString : 
25 MFGLKVKNAE ADTAKSNEKL QGAEATGSST TSGSGQSTQR GGS SGDTKVK AL Q VAVKKKS 60 

GSQGNSGDQG TEQVELESND LANAP I KRGS NPASPTQGSR LRHHPIQFGI WSIRHPHPLK 120 

AVA.CDRANS Q GPPQMIRLDP HSVRCALCL 149 

<2X2> Type : PRT 

<211> Length : 149 
30 SequenceName : SEQ ID 314 

SequenceDescription : 



Sequence 



35 <2X3> OrganismName : Mycoplasma pneumoniae 
<4O0> PreSequenceString : 

MFGLKVKDAT VDSSKQSTES LKGEESSSSS TTSSTSTTQR GGS SGDTKVK ALQVAVKKKS 60 
DSEDNGQIEL ETNNLANAPI KRGSNNNQQV QLKADDFGTS PSSSESGQSG TPTPWTPWLA 12 0 

TEQIHKDLPK WSASILILYD APYARNRTAI DRVDHLDPKV MTANYPPSWR TPKWNHHGLW 180 

40 DWKARDVLVQ TTGFFNPRRH PDWFDQGQAV AENTQTGFDT DDTDNKKQGF RKQGEQSPAP 240 
IALPFEAYFA NIGNLTWFGQ ALLVFGICLS 270 
<2X2> Type : PRT 
<2X1> Length : 270 

SequenceName : SEQ ID 315 

45 SequenceDescription : 

Secjuence 



<2X3> OrganismName : Mycoplasma pneumoniae 

50 <4 0 0> PreSequenceString : 

MGSQNQGSTT TTSAGNPDSL VTDKVDQKGQ VQTSGQNLSD TNYTNLSPNF TPTSDWPNAL 60 
SFTNKNNAQR AQLFLHGLLG SIPVLVNKSG ENNEKFQATD QKWSYTELKS DQTKLNLPAY 120 
GEVTsTGLLNPA LVETYFGTTR TSSTANQNST TVPGIGFKIP EQNNDSKAVL ITPGLAWTPQ 18 0 

DVGNLWS GT SFSFQLGGWL VSFTDFVKPR AGYLGLQLTG LDASDATQRA LIWAPPALSG 24 0 

55 LSWQLGQPVG PRGECVGFEG GVGGSSSVRL ARI YHHRNRG YLTGAPECFG LSGECGGSEC 3 00 

LQAKHELRPN PIH 313 
<2X2> Type : PRT 
<2X1> Length : 313 

SequenceName : SEQ ID 316 

60 SequenceDescription : 



Sequence 



<2X3> OrganismName : Mycoplasma pneumoniae 
65 <40 0> PreSequenceString : 

MS F GLVGTVN NNGWKSPFRH E TKYRAG YD K FKYYKTHYRG AKKAGTNDDR WRWTAWFDLD 60 
FAHQKIVLIE RGELHRQADL KKSDPATNET SKTVWGSIKE KLLQNVNNLH SEKGVFLWFR 12 0 
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QSGFTTTRN 

<212> Type : PRT 

<211> Length : 129 

SequenceNarae : SEQ ID 317 
5 SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 

10 <4 00> PreSequenceString : 

MAE P LAVDP T GLSAAAAKLA GLVFPQPPAP IAVSGTDSW AAINETMPSI ESLVSDGLPG 60 

VKAALTRTAS NMNAAADVYA KTDQSLGTSL SQYAFGSSGE GLAGVASVGG QPSQATQLLS 120 

TPVSQVTTQL GETAAELAPR WATVPQLVQ LAPHAVQMSQ NASPIAQTIS QTAQQAAQSA 180 
QGGSG«?MF*.Q LASAEKPATE QAEPVHEVTN DDQGDQGDVQ PAEWAAARD w<zftf»ioOQQ - 240.. 

15 PGGGVPAQAM DTGAGARPAA SPliAAPVDPS TPAPSTTTTL 280 



<212> Type : PRT 

<211> Length : 28 0 

SequenceName : SEQ ID 318 
SequenceDescription : 

20 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

25 MRYLIATAVL VAWLVGWPA AGAPPSCAGL GGTVQAGQIC HVHASGPKYM LDMTFPVDYP 60 
DQQALTDYIT QNRDGFVNVA QGSPLRDQPY QMDATSEQHS SGQPPQATRS WLKFFQDLG 12 0 

GAHPSTWYKA FNYNLATSQP ITFDTLFVPG TTPLDSIYPI VQRELARQTG FGAAILPSTG 180 
LDPAHYQNFA ITDDSLIFYF AQGELLPSFV GACQAQVPRS AIPPLAI 227 
<212> Type : PRT 

30 <211> Length r 227 

SequenceName : SEQ ID 319 
SequenceDescription : 

Sequence 
35 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MKMVKSIAAG LTAAAAIGAA AAGVTSIMAG GPWYQMQPV VFGAPLPLDP ASAPDVPTAA 60 
QLTSLLNSLA DPNVSFANKG SLVEGGIGGT EARIADHKLK KAAEHGDLPL SFSVTNIQPA 120 
40 AAGSATADVS VSGPKLSSPV TQNVTFVNQG GWMLSRASAM ELLQAAGN 168 
<212> Type : PRT 
<211> Length : 168 

SequenceName : SEQ ID 320 

SequenceDescription : 

45 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

50 MTYSPGNPGY PQAQPAGSYG GVTPSFAHAD EGAS KL PMYL NIAVAVLGLA AYFAS FGPMF 60 

TLSTELGGGD GAVSGDTGLP VGVALLAALL AGVALVPKAK S HVTWAVLG VLGVFLMVSA 12 0 

TFNKPSAYST GWALWWLAF IVFQAVAAVL ALLVETGAIT APAPRPKFDP YGQYGRYGQY 180 

GQ YGVQPGGY YGQQGAQQAA GLQSPGPQQS PQPPGYGSQY GGYSSSPSQS GSGYTAQPPA 240 

QPPAQSGSQQ SHQGPSTPPT GFPSFSPPPP VSAGTGSQAG SAPVNYSNPS GGEQSSSPGG 3 00 

55 APV 3 03 



<212> Type : 'PRT 

<211> Length : 303 

SequenceName : SEQ ID 321 
SequenceDescription : 

60 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 
65 MKGPGVSDCV ATVRHDNVFA IAAGLRWSAA VPPLHKGDAV TKLLVGAIAG GMLACAAILG 60 
DGIASADTAL IVPGTAPSPY GPLRSLYHFN PAMQPQIGAN YYNPTATRHV VSYPGSFWPV 120 
TGLNSPTVGS SVSAGTNNLD AAIRSTDGPI FVAGLSQGTL VLDREQARLA NDPTAPPPGQ 180 
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LTFIKAGDPN NLLWRAFRPG THVPIIDYTV PAPAESQYDT INIVGQYDIF SDPPNRPGNL 240 
LADLNAIAAG GYYGHSATAF SDPARVAPRD ivrrTNSLGA TTTTYFIRTD QLPLVRALVD 3 00 

MAGLPPQAAG TVDAALRP 1 1 DRAYQPGPAP AWPRDLVQG IRGIPAIAPA IAIPIGSTTG 3 60 

ASAATSTAAA TAAATNALRG ANVGP GANKA LSMVRGLLPK GKKH 404 
5 <212> Type : PRT 

<211> Length : 404 

SequenceName : SEQ ID 322 

SequenceDescription : 

10 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

M.QT.T.z-TT • T,pppFDATPN PIEDI.DvOiV.fl MH"V.:,ii7;r.qT- cr^A AQLGEI ""T»wr.r.», «o so 

15 KAPHCPAKSD QTPAGAAGDG DLPEVGGRVT SPPQPPVAAL TGYSANIGGL SVPHSWIMLPP 12 0 

AVRQVAAMFP GATPMYMTGS SDGSYAGLAA AGLAGTGLAG LAARGGSAPT PAAAAPAGAG 180 
GAGPAATRPA AQQTPAVPAA AAGSAIPGLP PGLPPGWAN LAATLAAIPG ATIIWPPSP 24 0 

NANQ 244 
<212> Type : PRT 
20 <211> Length : 244 

SequenceName : SEQ ID 323 
SequenceDescription : 

Sequence 
25 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MDVALGVAVT DRVARLALVD SAAPGTVIDQ FVLDVAEHPV EVLTETWGT DRSLAGENHR 60 

LVATRL CWPD QAKADELQHA LQDS GVHDVA VISEAQAATA LVGAAHAGSA VLLVGDETAT 120 

30 L S WGDPDAP PTMVAVAPVA GADATSTVDT LMARLGDQAL APGDVFLVGR SAEHTTVLAD 18 0 

QLRAASTMRV QTPDDPTFAL ARGAAMAAGA ATMAHPALVA DATTSLPRAE AGQSGSEGEQ 240 

LAYS QAS DYE LLPVDEYEEH DEYGAAADRS APLSRRSLLI GNAWAFAVI GFASLAVAVA 3 00 

VTIRPTAAb'K PVEGHQNAQP GKFMPLL PTQ QQAPV^PPPP DDPTAGFQGG TIPAVQrTWP 3 60 

RPGTSPGVGG TPASPAPEAP AVPGWPAPV PIPVPXXIPP FPGWQPGMPT IPTAPPTTPV 42 0 

35 TTSATTPPTT PPTTPVTTPP TTPPTTPVTT PPTTPPTTPV TTPPTTVAPT TVAPTTVAPT 480 

TVAPTTVAPA TATPTTVAPQ PTQQPTQQPT QQMPxQQQTV APQTVAPA^Q PPSGGRNGSG 54 C 

GGDLFGGF 548 
<212> Type t PRT 
<211> Length z 548 

40 SequenceName : SEQ ID 324 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 

MKNARTTLIA AAIAGTLVTT SPAGIANADD AGLD PNAAAG PDAVGFD PNL PPAPDAAPVD 60 
TPPAPEDAGF DPNLPPPLAP DFLSPPAEEA PPVPVAYSVN WDAIAQCESG GNWSINTGNG 120 
YYGGLRFTAG TWRANGGSGS AANASREEQI RVAENVLRSQ GIRAWPVCGR RG 172 

50 

<212> Type : PRT 
<211> Length : 172 

SequenceName : SEQ ID 325 

SequenceDescription : 

55 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 
60 MTRLIPGCTL VGLMLTLL PA PTSAAGSNTA TTLFPVDEVT QLETHTFLDC HPNGS CDFVA 60 
GANLRTPDGP TGFP PGLWAR QTTEIRSTNR LAYLDAHATS QFERVMKAGG SDVITTVYFG 120 
EGPPDKYQTT GVIDSTNWST GQPMTDVNVI VCTHMQWYP GVNLTSPSTC AQANFS 176 

<212> Type : PRT 
65 <211> Length : 176 

SequenceName : SEQ ID 326 
SequenceDescription : 
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Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MTPGLLTTAG AGRPRDRCAR IVCTVFIETA WATMFVALIi GLSTISSKAD DIDWDAIAQC 60 
ESGGNWAANT GNGLYGGLQI SQATWDSNGG VGSPAAASPQ QQIEVADNIM KTQGPGAWPK 12 0 

CSSCSQGDAP LGSLTHILTF LAAETGGCSG SRDD 154 
<212> Type : PRT 
<211> Length : 154 

SequenceName : SEQ ID 327 

SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MMQQAVSGIT GALGGAVGGV MGPLTQLPQQ AMQAGQGAMQ PLMSALQQTY GAEGLDVADG 60 
ARLVDS I EGE PGLGGEPGAG DVGAGGGGGG TTPTGYLGPP PVPTSSPPTT PAGAPAKSVT 120 
PDPVSGTPRA SGPAGMTGMP MVP PGALGAG AEGANKDKPV EKRVTGCAEW STGQGPLNST 180 
AECSGEICRR QAGGHQVDAT DPCCAERRQG 210 
<212> Type : PRT 
<211> Length : 210 

SequenceName : SEQ ID 328 

SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 

MIRELVTTAA ITGAAIGGAP VAGADPQRYD GDVPGMNYDA SLGAPCSSWE RFIFGRGPSG 60 
QAEACHFPP P NQFPPAETGY WVISYPLYGV QQVGAPCPKP QAAAQSPDGL PMLCLGARGW 120 
QPGWFTGAGF FPPEP 135 
<212> Type : PRT 
<211> Length : 135 

SequenceName : SEQ ID 329 

SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<40 0> PreSequenceString : 

MKTTGTTIKL GIVWLVLSVF TVMIIWFGQ VRFHHTTGYS AVFTHVSGLR AGQFVRAAGV 60 
EVGKVAKVTL I DGDKQVLVD FTVDRSLSLD QATTASIRYL NL I GDRYLEL GRGHS GQRLA 12 0 

PGATIPLEHT HPALDLDALL GGFRPLFQTL DPDKVNSIAS SIITVFQGQG ATINDILDQT 180 
ASLTATLADR DHAIGEWNN LNTVLATTVK HQTEFDRTVD KLEVLITGLK NRADPLAAAA 240 
AHISSAAGTL ADLLGRIVHC CTAASGTSRA SSSRS 275 
<212> Type : PRT 
<211> Length : 275 

SequenceName : SEQ ID 33 0 

SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MTPRSLVRIV GVWATTLAL VSAPAGGRAA HADPCSDIAV VFARGTHQAS GLGDVGEAFV 60 
DSLTSQVGGR SIGVYAVNYP ASDDYRASAS NGSDDASAHI QRTVAS CPNT RIVLGGYSQG 120 
ATVIDLSTSA MPPAVADHVA AVALFGEPSS GFSSMLWGGG SLPTIGPLYS SKTINLCAPD 180 
DPICTGGGNI MAHVSYVQSG MTSQAATFAA NRLDHAG 217 
<212> Type : PRT 
<211> Length : 217 

SequenceName : SEQ ID 331 

SequenceDescription : 



Sequence 
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<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MISTTRIDFL WILSVAFASM I ALATLLTL I NQWGTPYIP GGDSPAGTDC SELASWVSNA 60 

ATARPVFGDR FNTGNEEAAL AARGFQQGTA PNALVIGWNG HHTAVTLPDG TPVSSGEGGG 120 

5 VRVGGGGAYQ PKFTHHMYLP MDVDAGEDQP PAPDEPVTAV DDVEPEMPAP CPTQRPPVTP 180 

RHNLCNKLRT MPGALSAALA AAAPVWPAPI SGCRGFSTSL LAKRNHPVIV GK 23 2 



<212> Type : PRT 
<211> Length : 232 
10 SequenceName : SEQ ID 3 32 

SequenceDescription : 

Sequence 



15 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MTTMITLRRR FAVAVAGVAT AAATTVTLAP APANAADVYG AIAYSGNGSW GRSWDYPTRA 60 
AAEATAVKSC GYSDCKVLTS F T AC GAVAAN DRAYQGGVGP TLAAAMKDAL TKLGGGYIDT 120 
WACN 124 
20 <212> Type : PRT 

<211> Length : 124 

SequenceName : SEQ ID 333 

SequenceDescription : 

25 Sequence 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MAGLNIYVRR WRTALHATVS ALIVAILGLA I TPVAS AATA RATLSVTSTW QTGFIARFTI 60 
30 TNSS TAPLTD WKLEFDLPAG ESVLHTWNST VARSGTHYVL SPANWNRIIA PGGSATGGLR 120 

GGLTGSYSPP SSCLLNGQYP CT 142 

<212> Type : PRT 

<211> Length z 142 

SequenceName : SEQ ID 334 
35 SequenceDescription : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 

40 <400> PreSequenceString t 

MLTRAIKTQL VLLTVLAVIA VWLGWYFLR IPSLVGIGRY TLYAELPRSG GLYRTANVTY 60 
RGITIGKVTG VEPTERGARA TMSIDNGYQI PTDASANVHS VSAVGEQFVD LVSTRTSGPY 12 0 

LRHGQTITTT TVPSQIGPAL DAANRGLAVL PKDRVASVLH EAS EAVGGLG SSLNRLIEAT 180 
QAIAHDVRGS LEDIDDIIER SAPIIDSQVN SGNEIARWAA NLNTLAAQTA QTDPAVRSIL 240 

45 ANAAPTADQV NATFSDVRES LPQTLANLEV VIDMLKRYHN GVEQALVFLP QSGAIAQSVT 300 
TEFPGQAGLG VGGLALNQPP PCLTGFLPAS EWRSPADTST APLPKGTYCR IPMDASNWR 360 
GARNNPCVDV PGKRAATPRE CRSNEAYVPG GTNPWYGDPN QMLSCPAPAA RCDQPVKPGQ 42 0 

VIPAPSVNNG INPLPADQLP GTPPPVNDPL QRPGSGTVQC NGQQPNPCVY TPSTFPTTIY 480 
DVQSGKWAP DGWYSVEAS THAGADGWKV MLAPTG 516 

50 <212> Type : PRT 

<211> Length : 516 

SequenceName : SEQ ID 33 5 
SequenceDescription : 

55 Sequence 



<213> OrganismName : Rickettsia prowazekii 
<400> PreSequenceString : 

MLNNTQFLNL MKSYMKPEFY MS S IKNTTNL DLSSITNTIQ KAMNIFFTTN KISTESMQSL 60 
60 FKKNSEIIQN NINTILNSTK EVINSKDFKQ ATEYHQKCVK S I YETS MDNA KELANIAYEA 120 

SNKIFEAANK HITKNIHNAS NN I HNTAEQ V QKNFNNKSA 159 

<212> Type : PRT 

<211> Length : 159 

SequenceName : SEQ ID 336 
65 SequenceDescription : 

Sequence 
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<213> Organi stnName : Rickettsia prowazekii 
<400> Pre Sequences t ring : 

MNIKLVTYFL ILVSSLKVNA DLNHIQDSFK YQEAEQLTIE LPWNDCTAIH KFLEEKLFFS 60 
5 EQQIKKENKI HEKYKQFYLQ HNNKLSDFSM QFLEKKSEIN SVETLISGFL KFCEDNFQTS 120 
KSKSHSLNFF QKQQDQWLHN IRNENYKTYY KKKYEDNTFR NIN 163 
<212> Type : PRT 
<211> Length : 163 

SequenceNarae : SEQ ID 337 
10 SequenceDe script ion : 

Sequence 



<213> Organi sraName :• Rickettsia prowazekii 
15 <400> PreSequenceString : 

MKKLLL I ATA SATILSSSVS FAECIDNEWY LRADAGVAMF NKEQDKATGV KLKSNKAIPI 60 
DLGIGYYISE NVRADLTLGT TIGGKLKKYG AATNTHFTGT NVSVSHKPTV TRLLINGYVD 120 
LTSFDMFDVF VGGGVGPALV KEKISGVSGL ASNTKNKTNV SYKLIFGTSA QIADGVKVEL 180 
AYS WIND GKT KTHNVMYKGA SVQTGGMRYQ SHNLTVGVRF GI 222 
20 <212> Type : PRT 

<211> Length : 222 

SequenceName -. SEQ ID 33 8 

SequenceDescription : 

25 Sequence 



<213> Organi smName : Rickettsia prowazekii 
<400> PreSequenceString : 

MKKNMRKQML KIISIIIISL LLSSCSESTR DENGLLTDSQ STIIRDYIIS QNSKNLKVNL 60 

30 KEKFGSNLKG VKLIGIKLTN EDLSGIDFTS CEILRTDFMG SNLEKAILTN SVIQESNFAD 12 0 

SVIKNISGYN" ADFQGS I FNN ITLQNTNFVQ SNFSDTAFNK STIINVNFEN SPCFSNVLWCH 180 

SNIDSSNFQK THLKNWSFKN TNVMNSIFYG ADLGKSVINN TNFTNNYFES SDLSNTKFTS 240 

VIIKDSNFTQ SIFNSVNFNN IQSNNSFFSY TSFEDSTLHN IHLTKCDLQN STINSSVFNN 300 

FKIDNAILTN MSLNDNTFNN LSIKNSNTNF VRINKSKGFN ITLLNTNYSN AIFSNNDLKE 360 

35 FKVINTDLNN SEIINSNFTN GQFNNVNFSQ SLIQNVNFTD VKITLGNLNQ VALINSWLIN 42 0 

TNIINSVLSN SQINNTNYQA YYSFINTNVS NNIVINDNSN QIPPNNIVIN SEKDLQNISN 480 

LANMMLTNFN" LSNLVFNGVD FSKSIFKKAN LTNTVIKNSI LKDANF SAAI LTKTDFSKS I 540 
LTGSIFKFAQ IDQTCFSNSD LTNTDFTEAT IKNTAFDNAN THGIKGLE ' 58 8 
<212> Type : PRT 
40 <211> Length : 588 

SequenceName : SEQ ID 339 
SequenceDescription : 

Sequence 
45 

<213> Organi sniName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MIQKFTNVKL NDMRKILSFL MMCSLHLGLQ SQTWHGDPDS VAALPSIGIQ ESSCTRITFE 60 

WFPGFYSVE KREGNQVFQR ISMPGCGSFG NLGEAELPVL KKMIAVPEFS TANVAVKIKE 12 0 

50 TETFDNYNIY PNPTYWEEL PEGGTYLVEA FAINNDYYSQ NVSLPSTHYV YSQDGYFRSQ 18 0 

RFIEVTLYPF RYNPVRQEIL FAKKIEVTIT FDNPQPPLQK NTGIFNKVAS SAFINYEADG 24 0 

KSAIENDMVF SRGTTTYISG NVASNLPQNC DYLVIYDDMF NVNQQPHDEI KRLCEHRAFY 3 00 

NGFDVAAVS I KDVLNSFPSN ATSYIMETKL KNFIRSVYNQ SNAKRTLDGK LGYVLLIGKP 3 60 

LSKYLADTDN TKVPTSFIHN* VSLIPSHPTF GSICASDYFF SCVSPLDTVG DLFIGRFSVT 420 

55 NAHELHNLIE KTINKEISYN PIAHKNILYA EGKGCDAPIL RLFLKEIASG YTVNSILKSN 480 

QVSAIDSIFD CLNNGSHHFY FNTHGMPTVW GIGQGLDVNT LTARLNNTSS QGLCTSLSCS 540 

SAVADSTIRS LGEVLTTYAP NKGFSAFLGG SRATQYAVYL EGPCPPSEFY EYLPYSLYHN 600 

LSTWGEMLL SSIINTNSVD TYSKFNFNLL GDPALNIMAH GMEVSNCITL PNNTIISSPI 660 

TIKNGGCLKI PEKGVLHFTN NGSIQVMSGG TLEIGNQAKI SGETGANPTF ITVYGDGLAI 72 0 

60 NKQVEIDNID RLNLFSTHSV MPKFHFDSVK FNSAPLYTTN CIVEISNCEF TNRSDIISKN 780 

CDLSVENSMF SSSGITVFKP MATSSITGLS TKAKITDNTF FATGNFAYHI TNTPGLTATS 840 

NAAIKLDNIP EYYISGNKIV NCDEALVLNN SGNRTNRLHN ITRNVIKNCR IGSTLYNSYG 900 

IYNRNKISNN H I GVRLLNNS CFYFDNAPVI NEEDKQTFIS NRTWQLYSSN GTFPLNFHYN 960 

SLQGGDTDTW IYNDTYTMRY IDVSNNHWGN NDLFDPNQVF NTPDLFIWIP FWDGLPNGRS 1020 

65 GNSSAEAVEF QTALDCIGNS DYLSAKVALK MMVETYPESD FAIAALKELF RIEKMSGNDY 1080 

EGLKDYFRSN PTIISSQMLF PTADFLSARC DIVCENYQSA IDWYENRLNS EISYQDSVFA 1140 

VIDLGDIYWN MQLDSLRGTG IDLNILSCEQ RKSLESHQNV KNYLLSTLPE STGTLLPPLE 1200 
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CNKSSLDKSK IISISPNPAK AWTIIYYTD NPSCSVIKIY GINGASADIT GLPKHLSEGY 12 60 

YSIQFNTSNF DPGFYLVTLN VDQKIIDTEK LRIK 1294 
<212> Type : PRT 
<211> Length : 1294 
5 SequenceName : SEQ ID 340 

SequenceDe script ion : 

Sequence 



10 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MQGKNTIVTT GDYSIGLLSQ TSGNLNTDTI IRVNSDGSVT PSFSDGDDTF IVTAGNHAVG 60 

VLACASPGSA CACVSSLDEE STADTGSNEN NAIAKLDMAK GEITTHGTES YAAYANGTW 120 

KAGDTLDYTN ASVTLTDVDI TTHGDNAHAI AARQGTVS FN QGEIYTTGPD AAIAKIYNGG 180 

15 TVTLKNT S AV AHQGSGIVLE SS1NGQEATV DILSGSSLRS ANEILYHKDE TSNVTITDSE 24 0 

VSSAADVFIN NIKGHLTVDA TNSKITGSAN I S TDDNTHT Y LSLSDNSTWD IKADSTVSNL 3 00 

TVDNSTVYIS RADGRDVEPT RLTITENYVG NNGVLHLRTE LDDDNSATDK WINGNTSGT 3 60 

TRVKVTNAGG SGAYTLNGIE IISVEGESNG EFIKDSRIFA GAYEYSLTRG NTEATNKNWY 42 0 

LTNFQATSGG ETNS GGS SAP TVAPTPVLRP EAGSYVANLA AANTLFVMRL NDRAGETRYI 480 

20 DPVTEQERSS RLWLRQ I GGH NAWRDSNGQL RTTSHRYVSQ LGGDLLTGGF TDSDSWRLGV 540 

MAGYARDYNL THSSVSDYRS KGSVRGYSAG L YATWFADD I SKKGAYIDSW AQYSWFKNSV 60 0 

KGDELAYESY SAKGATVSLE AGYGFALNKS FGLEAAKYTW IFQPQAQAIW MGVDHNAHTE 660 

ANGSRIENDA NNNIQTRLGF RTFIRTQEKN SGPHGDDFEP FVEMNWIHNS KDFAVSMNGV 72 0 

KVEQDGVSNIi GEIKLGVNGN LNPAASVWGN VGVQLGDNGY NDTAVMVGLK YKF 773 



25 

<212> Type : PRT 

<211> Length : 773 

SequenceName : SEQ ID 341 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

35 MTKLKLLALG VLIATSAGVA HAEGKFS LGA GVGWEHPYK DYDTDVYPVP VINYEGDNFW 60 
FRGLGGGYYL WNDATDKLSI TAYWS PLYFK AKDS GDHQMR HLDDRKSTMM AGLSYAHFTQ 12 0 

YGYLRTTLAG DTLDNSNGIV WDMAWLYRYT NGGLTVTPGI GVQWNSENQN EYYYGVSRKE 180 
SARSGLRGYN PNDSWSPYLK LSASYNFLGD WSVYGTARYT RLSDEVTDSP MVDKSWTGLI 240 
STGITYKF 248 

40 <212> Type : PRT 

<211> Length z 248 

SequenceName : SEQ ID 342 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKIALAGLA GMLLVSASVN AMSISGQAGK EYTNIGVGFG TESTGLALSG NWTHNDDDGD 60 
50 VAGVGLGLNL PLGPLMATVG GKGVYTNPNY GDEGYAAAVG GGLQWKIGNS FRLFGEYYYS 120 
PDSLSSGIQS YEEANAGARY TIMRPVSIEA GYRYLNLSGK DGNRDNAVAD GLYVGVNASF 180 

<212> Type : PRT 
<211> Length : 180 
55 SequenceName : SEQ ID 343 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MTTLTARVFT TAEIIYRKTV IALVCHLNCS RQETVTMNKT IMALAIMMAS FAANASVLPE 60 

TPVPFKSGTG AIDNDTVYIG LGSAGTAWYK LDTQAKDKKW TALAAFPGGP REQATSAFID 120 

GNLYVFGGIG KNSEGLTQVF NDVHKYNPKT NSWVKLMSHA PMGMAGHVTF VHNGKAYVTG 18 0 

65 GVNQNIFNGY FEDLNEAGKD STAIDKINAH YFDKKAEDYF FNKFLLS FDP STQQWSYAGE 240 

SPWYGTAGAA WNKGDKTWL INGEAKPGLR TDAVFELDFT GNNLKWNKLD PVSSPDGVAG 3 00 

GFAGISNDSL I FAGGAGFKG SRENYQNGKN YAHEGLKKSY STDIHLWHNG KWDKSGELSQ 3 60 
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GRAYGVSLPW NNSLLIIGGE TAGGKAVTDS VLISVKDNKV TVQN 404 
<212> Type : PRT 
<211> Length : 404 

SequenceName : SEQ ID 344 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MATGGAALAG KAVMGAAAGA AGGASALQAA FQKASASMET GGDMSSMGSV VSSGGNGGGE 60 
AGTAGSSPFA QAAGFGDSGS S S S GGGFAKA AKLATGTASE LAKGVGS Q VK QGFQERVSET 120 
TGGKLAASIR ESMEPKEASQ SGQFEGNSLG ADSGPDSNEV RS 162 
<212> Type : PPT 
<211> Length : 162 

SequenceName : SEQ ID 345 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKRVLIPGVI LCGADVAQAV DDKNMYMYFF EEMTVYAPVP VPVNGNTHYT SESIERLPTG 60 

NGNISDLLRT NPAVRMDSTQ STSLNQGDIR PEKISIHGAS PYQNAYL IDG ISATNNLNPA 12 0 

NESDASSATN ISGMSQGYYL DVS LLDNVTL YDS FVPVEFG RFNGGV IDAK IKRFNADDSK 180 

VKLGYRTTRL DWLTSHIDEN NKSAFNQGSS GSTYFSPDFK KNFYTLSFNQ ELADNFGVTA 240 

GLSRRQSDIT RADYVSNDGI VAGRAQYKNV IDTALSKFTW FASDRFTHDL TLKYTGSSRD 300 

YNTSTFPQSD REMGNKSYGL AWDMDTQLAW AKLRTTVGWD HIS D YTRHDH DIWYTELSCT 3 60 

YGDITGRCTR GGLGHISQAV DNYTFKTRLD WQKFAVGDVS HQPYFGAEYI YSDAWTERHN 420 

QSESYVINAA GKKTNHTIYH KGKGSLGIDN YTLYMADHIS WRNVSLMPGV RYDYDNYLSN 480 

HNISPRFMTE WDI FADQTSM ITAGYNRYYG GNILDMGLRD IRNSWTESVS GNKTLTRYQN 54 0 

LKTPYNDELA MGLQQKIDKN VIARASEAHD QISKSSRTDS ATKTTITEYN NDGKTKTHSF 600 

NLSFELAEPL HIRQVDINPQ IVFSYIKSKG NLSLNNGYEE SNTGDNQWY NGNLVSYDSV 660 

PVADFNNPLK ISLNMDFTHQ PSGLWANTL AWQ EARKARI ILGKTNAQYI SEYSDYKQYV 72 0 

DEKLDSSLTW DTRLSWTPQF LKQQNLTISA DILNVLDSKT AVDTTNTGVA TYASGRTFWL 780 

DVSMKF 786 
<212> Type : PRT 
<211> Length : 786 

SequenceName : SEQ ID 346 

SequenceDescription : 



Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKTLLAIMD AGTAFAS QAG TLVS QGTEAS ANLTLTKPIV VNNTIQPVKG VYS GTLTAWT 60 
PLATGIVGAS DGQSHDYAVT FPDDIYAESS TSADAVISGD NNPDHKLKVS LTTLEQDPPS 120 
AASEEIGGKR YMMLKNTGTG GAYRWSHMK EQWEPDSYT IRTQAYIYAE 170 
<212> Type : PRT 
<211> Length : 170 

SequenceName : SEQ ID 347 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<40 0> PreSequenceString : 

MGIYHWSRKT KMKRTKSIRH ASFRKNWSAR HLTPVALAVA TVFMLAGCEK SDETVSLYQN 60 

ADD C S AANPG KSAECTTAYN NALKEAERTA PKYATREDCV AEFGEGQCQQ APAQAGMAPE 12 0 

NQAQAQQSSG SFWMPLMAGY MMGRLMGGGA GFAQQPLFSS KNPASPAYGK YTDATGKNYG 180 

AAQ PGRTMTV PKTAMAPKPA TTTTVTRGGF GESVAKQSTM QRSATGTSSR SMGG 234 



<212> Type : PRT 

<211> Length : 234 

SequenceName : SEQ ID 348 
SequenceDescription : 
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Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 
5 MTKMSRYALI TALAMFLAGC VGQREPAPVE EVKPAPEQPA EPQQPVPTVP SVPTIPQQPG 60 
PIEHEDRTAP PAPHIRHYDW NGAMQPMVS K MLGADGVTAG SVLLVDSVNN RTNGSLNAAE 120 
ATETLRNALA NNGKFTLVSA QQLSMAKQQL GLSPQDSLGT RSKAIGIARN VGAHYVLYSC 18 0 

ASGNVNAPTL QMQLMLVQTG EIIWSGKGAV SQQ 213 
<212> Type : PRT 
10 <211> Length : 213 

SequenceName : SEQ ID 349 

SequenceDescription : 

Sequence 
15 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MTKLMQFVQR CYYMTNKKMY FILILVFTLL QVCFFALWKA RDGSTTSLEC TSTLTRNAKT 60 
DHSLYYSANL SVILKKDGSG SFTIVGLTDE DTPRKFSHSY FFTYKIDSNG RISGNAKAKV 120 
20 SGLENQIKDE NFRLNFLDAS L TGKGNARL S KFNNVYIFSI PGLIINTCAP I 171 



<212> Type : PRT 
<211> Length : 171 

SequenceName : SEQ ID 350 
25 SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
30 <40 0> PreSequenceString : 

MGRISSGGMM FKAITTVAAL VIATSAMAQD DLTISSLAKG ETTKAAFNQM VQGHKLPAWV 60 
MKGGTYTPAQ TVTLGDETYQ VMSACKPHDC GSQRIAVMWS EKSNQMTGLF SAIDEKTSQE 12 0 

KLTWLNVNDA LSIDGKTVLF AALTGSLENH PDGFNFK 157 
<212> Type : PRT 
35 <211> Length : 157 

SequenceName - SEQ ID 351 
SequenceDescription : 



Sequence 

40 

<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKKQFLEKAV FTVAATAATV VLGNKMADAD TYTLQEGDSF FSVAQRYHMD AYELAS MNGK 60 

DITSLILPGQ TLTVNGSAAP DNQAAAPTDT TQATTETNDA NANTYPVGQC TWGVKAVATW 120 
45 AGDWWGNGGD WAS S AS AQGY TVGNTPAVGS I MCWTDGGYG HVAYVTAVGE DGKVQVLESN 180 

YKDQQWVDNY RGWFDPNNSG TPGSVSYIYP N 211 

<212> Type : PRT 

<211> Length : 211 

SequenceName : SEQ ID 3 52 
50 SequenceDescription : 



Sequence 



<213> OrganismName : Streptococcus mutans UA159 

55 <400> PreSequenceString : 

MSIKNILENK TTTIKVSFAG IATAASLILP MAVQAETTYT VKSGDTLSEI AS THGTTVDK 60 
LAKLNKINNI HLIHAGQILE LDAATEDTDA TPVQESQINE AETSASAKTS QTSEVTTTAP 120 
VQESQTSEVI TSAPAETSQT SEVPTEANQT NEVSSAVSVE TSQTSEATTS APVETSQTSE 180 
ATTAEPTETK TSQTNEVAAS AEENQTTSNT SGLSTSDAAA KEFIAQKESG GNYNAKNGQY 24 0 

60 YGRYQLSDSY LNGDLSEENQ ERVADAYVSS RYGSWTAAQA FWNANGWY 288 
<212> Type : PRT 
<211> Length : 288 

SequenceName : SEQ ID 353 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Streptococcus mutans UA159 
<40 0> PreSequenceString : 

MKCQAFEDFK ATSLNKLSYT TGGATDGEII ANRMLQGKAT KGEITMYTWN IIQNGWVNSL 60 
VSWGIGGYNS SIGYSAQGNR GFSNYPYDVS MDSDNSSSSS NTTGGYVNYN QSFNSGW 117 

5 

<212> Type : PRT 
<211> Length : 117 

SequenceName : SEQ ID 354 

SequenceDescription : 

10 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

15 MRYSQICRKS LALLATGMIL TTSTLPSISI LAEDSTGAPA RPDGQAPAGG GANTTTYDYS 60 

GINSGVLVAN GSKVTSSSKT KSTTSAQNTA LVQNGGSLTL HKANLIKSGD DNNGDNDNFY 12 0 

GINSILLAVN ERSKAYVSNS KLKASSSGSN GIFATDKATI YANKTSIATT ADNSRGLDAT 180 

YNGNIIANKM AISTKGAHSA AIATDRGGGN ISTTNSSLNT SGSGSPLLYS TGNIQWHVT 240 

GTSSNSQIAG MEGLNTILIH NSNLISTMTN KTASDPIANG VIIYQSQSGD AEATTGQSAH 3 00 

20 FELSKSKLTS SITSGSMFYL TNTSANIILN QSTLNFDANK AKLLTVAGNS ANNWGTPGSN 360 

GATVNFTGHK QTLKGDVDVD SISTLNMYLL DKTNYTGKTA VSTNSTNISP STSPITMNIS 420 

KNSKWVLTGH STVTMLNAEK GAKIVDKDGK TVSVISSSGQ KLVKGKSKYS LTVTGTYSQK 480 

VTTSSSNKPS SSYINRSDFD NYFKTTTAFV NNTKNTSN 518 
<212> Type : PRT 

25 <211> Length : 518 

SequenceName : SEQ ID 355 
SequenceDescription : 

Sequence 

30 

<213> OrganismName : Streptococcus mutans UA159 
<40 0> PreSequenceString : 

MNKIGDTLRD ARIEKKLSFD DWDKTGIAP HYILAMELDQ LKLLPEGKTN EYLEKYAHAV 60 
GLDPVSIIHG YRNQEMSDEL ILPSSAELAA SSDSNIEKKN EGKSIEEPQE LAIDSLDVTQ 120 

35 NITEETPQIE DFEVESEEAS KKIEKIPSRL SKYDYDEEPK KKFPWAL ILL ILLALTIISY 180 
VGYWYNQLQ TDSNKTELST STKKSKDTKN DANSTTQSQT SITTDFADGG NNITLSNTNG 240 
KVEVTFTLTG DEESWSATN TTDGESGTTL TATDKTYTVT LAEGS TTS ML TVGSPSGVEI 3 00 

TINGQKVDTT NLVNAGLTNI NLTVQ 3 25 

<212> Type : PRT 

40 c211> Length : 325 

SequenceName r SEQ ID 356 
SequenceDescription : 

Sequence 

45 

<213> OrganismName : Streptococcus mutans UA159 
<40 0> PreSequenceString : 

MKSRKRQRKG LVRKNEIIIL TLFVASAVSL LAFTNSFGVL AKS LHLEKIN KSITISLPFG 60 
KKKMEQTARY YSGEQVQISS SAKKDSLGKG LSHYQNWIGT VKKI KSQKDS RQKHHYSYEV 120 

50 TFDNGKALKY VQE KDLVKTK RSKYSKGQIV KLKSSATADL DGSSLTDYRA SAGKIDHISY 180 
NHSNTTGGYK YDITFDEGGK VTNIQEKDLD KVYEVQLKSE NTAAQNNEIL KQAFAYAKQH 240 
SGTILSLPNG EFKIGSQTPD KDYITLTSDT EIRGDNTTLL VEGSAYWFAF ATGTSASDGV 3 00 

KNFTMRNINI KASDLEKGNQ FMI MADHGDN WKICNNSFTM VHKKGSHIFD LGSLQNSAFE 360 
GNQFTGYAPE LTNVSKIDDN ADLHDFYSEV IQLDAAESSG VWDGGLIKAI DPNYENYNKE 420 

55 KQLCNNITIA NNSFVPYIDS HGKI IAYSGT IGQHSSDVGL VKIYDNVFSN SLVSRFNQNG 480 
KSEAWIFKAI HLKSNYNNAV YANSIS 506 
<212> Type : PRT 
<211> Length : 506 

SequenceName : SEQ ID 357 

60 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
65 <400> PreSequenceString : 

MRKLKVALFA SSILGMLAVS SYTAADTEDN QVTISHYNEQ AGTFDVNAVQ AANGKTIQS I 60 
DVAIWSEENG QDDLKWYHAS NDGSNQLTVH FNAENHGSKV GSYIAHAYIT YTDGNRVGVN 120 
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LGKRKLS LSA PQLSLKQGGL QLFSKLKPSA ADQLFSAVWS DENGQDDLHW YTADADGNTL 180 

AGYANHfCGYG TYHVHTYLKQ NGKMIPISAQ DIDIPKPKVK IQIDKINDTS YDWVNNVPP 240 

YISSVAIPVW SEQNGQDDLK WYQATKVADG IFKTTVYLKT HRFELGNYQA HIYGDSQLSK 3 00 

KLDGLGETHF NVPSIINYED PQVTIDHYNI NKGTFDVTVA ETDNSKAIQS ISAAVWSDAN 3 60 

QANIiYWYEAK QLANGKAAIT VDVQKHGNQT GSYNVHVYVH YNDGTTSGHV LANQQLNQIV 420 

HYQPSAVRIT AYMNEKNTYP VGQCTWGVKE LAPWIPNWLG NGGQWAS TVA VKGFKIGTVP 480 

KVGAIACWSD GGYGHVAYVT HVESNNRIQV KEANYKNQQY ISNFRGWFDP TTSYLGRLTY 540 

IYPD 544 
<212> Type : PRT 
<211> Length : 544 

SeguenceName : SEQ ID 358 

SequenceDescription : 



Sequence 



<213> Or/ganismName : Streptococcus mutans UA159 
<400> PzreSequenceString : 

MANNYSRRQQ PTKKTKGTSR KRPTEHIKTG FSALQKSVAI IAGILGIITA LITINNYRNS 60 
SHNDKKD S TS KTTIIKEKEV DDSNSNNNAA NSQAENDSNN NNNSAESNQN QTATTANDSN 12 0 

SNSANQWQAN SQSQANNQQN QNNANAGQ 148 
<212> Type : PRT 
<211> Length : 148 

SequenceName : SEQ ID 359 

SequenceDescription : 



Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PzreSequenceString : 

MKIFSFGTIR NNTALKPNYD DTTAFSGFGT IRNNTAQKQS TNCASWFNRF GTIRNNTALK 60 
LTILING-VSF CFGTIRNNTA LKPRGPIFVS TFRNRAIHLS QISASK 106 
<212> Type : PRT 
<211> Length : 106 

SequenceName : SEQ ID 360 

SequenceDescription : 



Sequence 



<213> OrganisraNarae t Streptococcus mutans UA159 
<400> P resequences tring : 

MKRKRNIjYFIi IGLFLTVFLL XGCSMQKKTK SESSSTSQKT TLQTKQSSEK STDAKQTTEA 60 
HSESSQSSSH SNNEETIiAP I DTGAVLKADY SSMAGTWKNE EGQTLTFDQR GLTTPGMTVS 120 
LLNIDQDGNL LLNVETGTKK NLTLYIVPAN KTLSNQYFSN GQSDESDKTK DRIVSSESLN 180 
SGKFTNR.VYY HVSTH 195 
<212> Type : PRT 
<211> Length : 195 

SequenceName : SEQ ID 361 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<4 00> P resequence String : 

MTPKKIKIAL TALISLMLAL FLFLFNHHSV RENSQQEKLK ISKASSKKSQ TSTSSVMTSS 60 
RKATEQXSQA QTQSQSQAEQ SNPNVILPIP QELVGTYKGS SPQASEITFT ISSNGQLRAQ 120 
ANFDPASDIN DVTATVS GVR KVGADTYIWE FVSGSSAALL PGVTGIGGLG KMQPGFILKG 180 
GQLTPIMFTG SVDGEIDYSH PNPYPVSLNK Q 211 
<212> Type : PRT 
<211> Length : 211 

SequenceName : SEQ ID 362 

SequenceDescription : 



Sequence 

<213> OrganismName : Streptococcus mutans UA159 
<400> Pre Sequences t ring : 

MKKIINVIVL SLSVFFLIAC SNSSTGEKTS QSSEETKVRL IVKTDSNKTD EKVAFKKGAT 60 
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VMDVLKDNYK VKESGGFITT IDGVTQDKKA GRYWMFDVND KLASKAADKI KVKNGDKIEF 12 0 

YLKVYKGKN 129 
<212> Type : PRT 
<211> Length : 129 
5 SequenceName : SEQ ID 363 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MSNKPWEEKV TDATTDNEEM TRNSKDASII STPILTILLS LFFLIIIGIL FFVLYTSNGG 60 
SNEKAAT S GF YSSSKTVKKA KNEANSQTDE QTTEAETSSS ETTSSSSDSD GET I T VQGGE 120 
GAAAIAAR&G ISVDKLYELN PEHMTHGYWY ANPGDNIKIK 160 
15 <212> Type : PRT 

<211> Length : 160 

SequenceName : SEQ ID 3 64 

SequenceDescription : 

20 Sequence 



<213> Organi smName : Streptococcus mutans UA159 
<4 00> PreSequenceString : 

MPDNRMNYSI DSNMQFPLVE ITLETGEFAY IQRGSMVYHT PSVTLNTKVN GRGS GLGKLV 60 

25 GAIGRSVTSG ESFFITQAVS NASDGKLALA PSMPGQVIAL ELGEKQYRLN DGAFLALDGS 120 

AQYQMKAQSV GRALFGGQGG LFVMTTEGQG TLLANSFGSI KKIELQNQEI TIDNAHWAW 180 

SRDLNYDIHL ENGFMQSIGT GEGWNTFRG TGEIYVQSLN LQQFAGVLQG FITNTNR 237 



<212> Type : PRT 
30 <211> Length : 23 7 

SequenceName : SEQ ID 365 
SequenceDescript ion : 

Sequence 

35 

<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKKNYFWYGL LGLLALYLIT IAFIPGFHIF FSNMLMLALF FMLIALSNRS IFFFFLALGF 60 

LSIYLKDIFH FDYSTGPLFT GIIIIGVILN SFLKPHYSYS YKGNHYFNMK QHANYIDNET 120 
40 DVFLKTLFSE NTSYVTSQEL NKIIIDTKFG EQSVDLSQAQ FMTDSPEIHI DVSFGETNLR 18 0 

IPNNWKIINK THSPFASISF SGFPSTNGDF INVTLTGTVA MGSLNIQY 22 8 

<212> Type : PRT 

<211> Length : 228 

SequenceName : SEQ ID 366 
45 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 
50 <4 00> PreSequenceString : 

MKSITKKIKA TLAGVAAL FA VFAPSFVSAQ ESSTYTVKEG DTLSEIAETH NTTVEKLAEN 60 
NHIDNIHLIY VDQELVIDGP VAPVATPAPA TYAAPAAQDE TVSAPVAETP WSETWSTV 120 
SGSEAEAKEW IAQKESGGSY TATNGRYIGR YGSWTAAKNF WLNNGWY 167 
<212> Type : PRT 
55 <211> Length : 167 

SequenceName : SEQ ID 367 
SequenceDescription : 

Sequence 

60 

<213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MKHSHKKSFD WYSMQQRYSI RKYYFGAASV LLGTALVLGA AASVQTVQAE ENKQETTNSI 60 
SVGRGEAATK PAEVSASNKE KTYAAPTVAN PVETTPVKTE EVTKPAEKVE EAKDKKEEVT 120 
65 HQDAVDKS KL LTALSRAKKL ESKLYTEASA ANLQTSIQAG QSLLGKADAT EAELSAAESS 18 0 

IQSFIIGLEL RSNSNKETVS ETPVAKKADA VESKEGAKPA ATTERSAVDS AILPTSTADK 240 
VETTSAPASI NEILKLGLSL SDARQNPAIR KEDVNRGYSG FRAASNPANP IVSGSGNTVA 300 
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FADISQGGRS YSFRGYGNSR GGNSIHYDVT TVRSGNSVNF TISYSAPGDS REFVNNNFIL 3 60 

DKGDGB'GNPS NAT ITS SNPR VREQSKSISQ GANYVSHSGY SMTSAISTNT EQTIRFSLPI 420 

INLNGDLSVR LKPVTFNVDQ GGGGAATSND PYSNSNYYYR ANPLYLDANP YGGTNNKTVS 480 

EDIDFQTVYIj PTSKLPEGQT RLVREGEKGQ RQITYKVHRF GNETLLGLPI SNSVTKEAKP 540 

5 RIMQIGVAKD LIDTVKPRVD QNKVGDTNNL TFYLDNDGNG VYTEGVDELV QKIAIKDGAK 600 

GEKGDQGERG LTGAKGEKGD RGERGLTGAQ GAKGEKGDRG ERGLTGAQGA KGE KGDRGER 660 

GLTGAQGAKG- EKGDRGERGL TGAQGAKGEK GAQGERGLTG AQGAKGEKGD QGERGLTGAQ 720 

GAKGEKGDQG ERGLTGAQGA KGEKGAQGER GLTGTQ GAKG EKGDRGERGL TGAQGAKGEK 780 

GDRGERGLTG AQGAKGE KGA QGERGLTGAQ GAKGEKGDQG ERGLTGAQGE KGDRGERGLT 840 

10 GAKGEKGDQG ERGITGAKGE KGAQGERGLT GAQGAKGEKG DQGERGLTGA QGE KGAQGQA 900 

GRDGVTPTVT VKDNKNDGTH TITINDGRGN VTSTWRDGF DGAS PLVATQ RNDADKTTTV 9 60 

I F Y YDKNGlSnST ELDASDKKLK EWIADGAKG EKGDKGEQGL QGRDGEQGPK GEDGKTPTVK 1020 

VTDGQDGTHT ITINDGKGGI TTTWRDGFD GASPLVSTHR NEADKTTTVJ FYYDLNDNNQ 1080 

FDEGDTKTjKE? WIADGKQGP KGDKGDHGKD GFTPEVTVTD NNNGTHTITI TQPDNRPS-LT 1140 

15 TIVKNGEDGBC TPKVKAERDD AKKQTTLTFY IDKDGDGSYT AGKDELVQTT WKDGQDGAA 12 00 

GASGRDGKEV LNGKVDPTTE GKDGDTFVNT QTGDVFVKKG NTWEPAGNIK GPKGDKGADG 12 60 

AKGEKGAQGE RGLTGAQGVK GEKGDQGERG LTGSKGEKGD QGERGLTGAQ GAKGD KGEQG 13 20 

LQGRDGAQGP KGADGQRGPA GPQGPKGEQG NPGTPGKDGK S L I AVKNGVL VTITPVEGRP 13 80 

QTTFVEDGQEC GADGKTP TVT ITEGQNGTHT LTVHNPGSPD VTTTIRDGAT GQAGRDGKDV 1440 

20 LNGKVNPQPN" QGKNGDKYIN I ETGDVYVKN NGNWDKEGNI KGP KGDKGAD GAKGEKGDQG 150 0 

ERGLTGAQGA. KGAD GAVGRD GRDGKDVLNG KANPEAHQGK DGDKYVNTET GDVFVKNNGN 1560 

WDKEGNI KGP KGDKGAD GAK GEKGDRGERG LTGAQGAKGA DGAAGRDGRD GRDGKDVLNG 1620 

KVNPEANQGK DGDKYVNTET GDVFVKNNGN WDKEGNI KG S KGDKGERGED GKTPEVTVTP 1680 

GKDGHSTDIT FTVPGKDPVT VNVKD GENGL NGKTPKVDLL RVQGKNGNPS HTIVTFYTDE 1740 

25 NNDGKYTPGT DELLGSEMIK DGAKGADGRD GKS LLTVKDG KETKVYQEDP ANPGQPLNPE 18 0 0 

KP LAVI RDGV D GKS P TV TAV RKDEAGHKGV EITVDNHDGS QPTTVFVQDG AKGKTGATGQ 1860 

DGQTPTITTQ RGQDGQSTW TITTSGKDPV TFTVKDGKNG KDGRAPKIKV EDITSPSRIR 192 0 

RDTDAAATPT RNGIRVTVYD DVNDNGVYDE GVDKVLNSKD IYNGIDGRDG SAPTI TTKDN 1980 

GDGTHT I TVQ NPDGSESTTV VKDGKDGKTA NITTTENPDG SHTITVTNPD GSTKETWKN 2040 

30 GKDGKTPKVE VTDNNDGTHT VKVTDGD GNV TNAIIKDGKD GKAATATTTE NPDGSHTVTI 2100 

TNPDGTKNEF WKNGRDGVD GRTPTASVRD NGDGSHTIVI TNPEGVTTET TVRDGKS PKV 2160 

TITDEQNGTH KISVLNGDGT TTETXIKDGK SPVATVRDNQ DGTYTIRVEN GNGTVS ETTV 2220 

RD GKS PTAKV VDNGDGTHT I TWNSDGITT TTTVRDGREP KLEVIDNNDG SHTIKVTGAD 2280 

GKGTTTTI FD GKSPKANIVD NGDGTHTLTI VDSDGREYKS IIKDGKDGKD SVSPTVTVKN 2340 

35 NNDGTHWTX TNPDGSKTEM VIKDGKDGKS PKVSVEDNGD GSHTITIINS DGTVTKTVIK 2400 

DGKDGRDGRO GRDGKDGKDG KCGCQDKPVT PSNDKPVPPT PNVPTPEVPV KPVPAQPTPN 2460 

VPTPEVPVQP TPAVSTPEVP VKPVPAVPEQ PWPTPAQPA TPVNANPVAP TTGKENRGDK 2520 

LPETGSQSDY ISVLLGSGIL LSLYVGRRKE D 2551 
<212> Type : PRT 

40 <211> Length. : 2551 

SequenceName : SEQ ID 3 68 
SeqxienceDe script ion : 

Sequence 
45 

<213> Organ israName : Streptococcus pneumoniae R6 
<4 00> Pre Sequenc eSt ring : 

MKKRMLLASX VALSFAPVLA TQAEEVLWTA RSVEQIQNDL TKTDNKTSYT VQYGDTLSTI 60 

AEALGVDVTV LANLNKI TNM DLIFPETVLT TTVNEAEEVT EVEIQTPQAD SSEEVTTATA 12 0 

50 DLTTNQVTVE) DQTVQVADLS Q P I AEAPKE V ASSSEVTKTV IASEEVAPST GTSVPEEQTA 180 

ETSSAVAEEA. PQETTPAEKQ ETQTSPQAAS AVEATTTSSE AKEVAS SNGA TAAVSTYQPE 240 

ETKIISTTYE APAAPDYAGL AVAKS ENAGL QPQTAAFKEE IANLFGITSF SGYRPGDSGD 3 00 

HGKGLAI DFM VPERSELGDK IAEYAIQNMA SRGISYIIWK QRFYAPFDSK YGPANTWNPM 3 60 

PDRGSVTENH YDHVHVSMNG 380 

55 <212> Type : PRT 

<211> Length : 380 

SeqxienceName : SEQ ID 369 
Sequ.enceDescription : 

60 Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MTILGKDTVQ QSAKGESVTQ EATPEYKLEN TPGGDKGGNT GSSDANANEG GGSQAGGSAH 60 

65 TGSQNSAQSQ ASKQLATEKE SAKNAIEKAA KNKQDE I KGA PLSDKEKAEL LARVEAEKQA 120 

ALKEIENAKT MEDVKEAETI GVQAIAMVTV PKRPVAPNAA PKTTSAPQAT AGTMQDVTYQ 180 

SPAGKQLPNT GSASSAALAS L GL WATS GF ALLGRKTRRR K 221 
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<212> Type : PRT 

<211> Length : 221 

SequenceName : SEQ ID 3 70 
SeqnaenceDescription : 

5 

Sequence 



<213> Organi smName : Streptococcus pneumoniae R6 

<400> Pre Sequences t ring : 
10 MMTTGCSMG-A. YHALNFFLQH PDVFTKVIAL SGVYDARFFV GDYYNDDAIY QNSPVDYIWN 60 

QNDGWFIDRY RQAEIVLCTG LGAWEQDGLP SFYKLKEAFD KKQIPAWFAE WGHDVAHDWE 12 0 

WWRKQMPYFL GNLYL 13 5 

<212> Type : PRT 

<211> Length : 135 
15 SeqxienceName : SEQ ID 371 

SeqiaenceDescription : 

Sequence 



20 <213> OrganismName : Streptococcus pneumoniae R6 
<400> Pre Sequences t ring : 

MNKGLFEKRC KYSIRKFSLG VASVMIGATF FGTSPVLADS VQSGSTANLP ADLATALATA 
KENDGHDFEA PKVGEDQGSP EVTDGPKTEE ELLALEKEKP AEEKPKEDKP AAAKPETPKT 
VTPEWQTVEK KEQQGTVTIR EEKGVRYNQL SSTAQNDNAG KPALFEKKGL TVDANGNATV 

25 DLTFKDDSEK GKSRFGVFLK FKDTKNNVFV GYDKDGWFWE YKSPTTSTWY RGSRVAAPET 
GSTNRLSITL KSDGQLNASN NDVNLFDTVT LPAAVNDHLK NEKKILLKAG SYDDERTWS 
VKTDNQEGVK TEDTPAEKET GPEVDDSKVT YDTIQSKVLK AVIDQAFPRV KEYSLNGHTL 
PGQVQQFNQV FINNHRITPE VTYKKINETT AEYLMKLRDD AHLINAEMTV RLQWDNQLH 
FDVTKIVNKN QVTPGQKIDD ERKLLSSISF LGNALVSVSS DQTGAKFDGA TMSNNTHVSG 

30 DDHIDVTNPM KDLAKGYMYG F VS TDKLAAG VWSNSQNSYG GGSNDWTRLT AYKETVGNAN 
YVGIHSSETaTQ WEKAYKGIVF PEYTKELPSA KWITEDANA DKKVDWQDGA IAYRSIMNNP 
QGWKKVKDIT AYR I AMNFGS QAQNPFLMTL DGI KKINLHT DGLGQGVLLK GYGSEGHDSG 
HLNYADIGKR IGGVKDFKTL IEKAKKYGAH LGIHVNASET YPESKYFNEK ILRKNPDGSY 
SYGWNWLDQG INIDAAYDLA HGRLARWEDL KKKLGDGLDF IYVDVWGNGQ SGDNGAWATH 

35 VLAKEINKQG WRFAI EWGHG GEYDSTFHHW AADLTYGGYT NKGINSAITR FIRNHQKDAW 
VGDYRSYGGA ANYPLLGGYS MKDFEGWQGR SDYNGYVTNIi FAHDVMTKYF QHFTVSKWEN 
GTPVTMTDNTG STYKWTPEMR VELVDADNNK VWTRKSNDV NSPQYRERTV TLNGRVI QDG 
SAYLTPWNWD ANGKKLS TDK EKMYYFNTQA GATTWTIiPSD WAKS KVYIi YK LTDQGKTEEQ 
ELTVKDGKIT LDLLANQPYV LYRSKQTNPE MSWSEGMHIY DQGFNSGTLK HWTISGDASK 

40 AEIVKSQGAU DMLRIQGNKE KVSLTQKLTG LKPNTKYAVY VGVDNRSNAK ASITVNTGEK 
EVTTYTNKSIi ALNYVKAYAH NTRRNNATVD DTSYFQNMYA FFTTGSDVSN VTLTLSREAG 
DEATYFDEIR TFENETSSMYG DKHDTGKGTF KQDFENVAQG IFPFWGGVE GVEDNRTHLS 
EKHDPYTQRG WNGKKVDDVI EGNWSLKTNG LVSRRNLVYQ TIPQNFRFEA GKTYRVTFEY 
EAGSDNTYAF WGKGEFQSG RRGTQASNLE MHELPNTWTD SKKAKKATFL VTGAETGDTW 

45 VGIYSTGNAS NTRGDSGGNA NFRGYNDFMM DNLQIEEITL TGKMLTENAL KNYLPTVAMT 
NYTKESMDAXi KEAVFNLSQA DDDI SVEEAR AEIAKIEALK NALVQKKTAL VADDFASLTA 
PAQAQEGLAN AFDGNLSSLW HTSWGGGDVG KPATMVLKEA TEITGLRYVP RGSGSNGNLR 
DVKLWTDES GKEHTFTATD WPDNNKPKDI DFGKTIKAKK IVIiTGTKTYG DGGDKYQSAA 
ELIFTRPQVA ETPLDLSGYE AALAKAQKLT DKDNQEEVAS VQAS MKYATD NHLLTERMVE 

50 YFADYLNQLK DSATKPDAPT VEKPEFKLSS VASDQGKTPD YKQEIARPET PEQILPATGE 
SQFDTALFLA SVSLALSALF WKTKKD 
<212> Type : PRT 
<211> Length : 1767 

SeqiaenceName : SEQ ID 372 

55 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

60 <400> Pre SequenceSt ring : 

MKLYNKSELR YSRIFFDKRP PAFAFILIIS TAIILSGALV GAAYI PKNYI VKANGNSVIT 60 

GTEFLSAISS GKWTLHKS E GDMVNAGDVI ISLSSGQEGL QASSLNKQLV KLRAKEAIFQ 120 

KFEQSLNEKY NRMSNSGEEQ EYYGKVEYYL SQLNSENYNN GTQYSKIQDE YTKLNKITAE 180 

RNQLDADLQT LQNELIQLQQ QGDSPSLSDT TSADDKAKLE TKILEITTKI EALKTNITSK 240 

65 NSEIDSQQSN IKDMNRTYND PTSQAYNIYA QLVSELGTAR SNNNKSITEL EANLGVATGQ 300 

DKAHS ILAPlsT EGTLHYLVPL KQGMSIQQGQ TIAEVSGKEK GYYVEAFVLA SDISRVSKGA 3 60 

KVDVAITGVlSr SQKYGTLKGQ VRQIDSGTIS QETKEGNISL YKVMIELETL TLKHGSETW 420 



60 
120 
180 
240 
300 
360 
420 
480 
. 540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1767 
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LQKDMPVEVR IVYDKETYLD WILEMLSPKQ 
<212> Type : PRT 
<211> Length : 450 

SequenceName : SEQ ID 373 
5 SecjuenceDescription : 

Sequence 



450 



<213> OrganismName : Neisseria meningitidis Z2491 

10 <400> PreSequenceString : 

MNKGLHRIIF SKKHSTMVAV AETANSQGKG KQAGSSVSVS LKTSGDLCGK LKTTLKTLVC 60 

SLVSLSMVLP AHAQITTDKS APKNQQWIL KTNTGAPLVN IQTPNGRGLS HNRYTQFDVD 120 

NKGAVLNNDR NNNPFLVKGS AQLILNEVRG TASKLNGIVT VGGQKADVI I ANPNGITVNG 18 0 

GGFKNVGRGI ' LTIGAPQIGK DGALTGFDVR QGTI-TVGAAG WNDKGGADYT GVLARAVALQ 240 

15 GKLQGKNLAV STGPQKVDYA SGEISAGTAA GTKPTIALDT AALGGMYADS ITLIANEKGV 3 00 

GVKNAGTLKA AKQLIVTSSG RIENSGRIAT TADGTEASPT YLSIETTEKG AAGTFISNGG 3 60 

RIESKGLLVI ETGEDI SLRN GAWQNNGSR PATTVLNAGH NLVIESKTNV NNAKGSANLS 420 

AGGRTTINDA TIQAGSSVYS STKGDTELGE NTRIIAENVT VLSNGSIGSA AVI EAKDTAH 480 

IESGKPLSL.E TSTVASNIRL NNGNIKGGKQ LALLADDNI T AKTTNLNTPG NLYVHTGKDL 540 

20 NLNVDKDLSA ASIHLKSDNA AHITGTSKTL TAS KDMGVE A GLLNVTNTNL RTNSGNLHIQ 600 

AAKGNIQLRN TKLNAAKALE TTALQGNIVS DGLHAVSADG HVSLLANGNA DFTGHNTLTA 660 

KADVNAGSVG KGRLKADNTN ITSSSGDITL VAGNGIQLGD GKQRNSINGK HI S IKNNGGN 720 

ADLKNLNVHA KSGALNIHSD RALSIENTKL ESTHNTHLNA QHERVTLNQV DAYAHRHLSI 78 0 

TGSQIWQNDK L P S ANKL VAN GVLALNARYS QIADNTTLRA GAINLTAGTA LVKRGNINWS 840 

25 TVSTKTLEDN AELKPLAGRL NIEAGSGTLT IEPANRISAH TDLSIKTGGK LLLSAKGGNA 900 

GAPSAQVSSL E AKGN I RIiVT GETDLRGSKI TAGKNIjWAT TKGKLNIEAV NNSFSNYFPT 960 

QKAAELNQKS KELEQQIAQL KKSSPKSKLI PTLQEERDRL AFYIQAINKE VKGKKPKGKE 1020 

YLQAKLSAQN IDLISAQGIE ISGSDITASK KLNLHAAGVL PKAADSEAAA ILIDGITDQY 10 80 

EIGKPTYKSH YDKAALNKPS RLTGRTGVSI HAAAALDDAR IIIGASEIKA PSGSIDIKAH 1140 

30 SD IVLEAGQN DAYTFLKTKG KSGKI IRKTK FTSTRDHLIM PAPVELTANG ITLQAGGNIE 1200 

ANTTRFNAPA GKVTLVAGEE LQLLAEEGIH KHELDVQKSR RFIGIKVGKS NYSKNELNET 12 60 

KLPVRWAQT AATRS GWDTV LEGTEFKTTL AGAD IQAGVG EKARVDAKI I LKGIVNRIQS 132 0 

EEKLETNSTV WQKQAGRGST IETLKLPSFE SPTPPKLSAP GGYIVD I PKG NLKTEIEKLS 13 80 

KQPEYAYLKQ LQVAKNINWN QVQLAYDRWD YKQEGLTEAG AAIIALAVTV VTSGAGTGAV 1440 

35 LGLNGAAAAA TDAAFAS LAS QASVSFINNK GDVGKTLKEL GRSSTVKNLV VAAATAGVAD 1500 

KIGASALNCJV SDKQWINNLT VNLANAGSAA LINTAINGGS LKDNLGDAAL GAIVSTVHGE 1560 

VASKIKFNLS EDYITHKIAH AIAGCAAAAA NKGKCQD GAI GAAVGE I VGE ALTNGKNPAT 1620 

LTAKEREQ XL AYSKLVAGTV SGWGGDVNT AANAAKVAIE NNLLSQEEYA LREKLIKKAK 1680 

GKGLLSLDWG SLTEQEARQF IYLIEKDRYS NQLLDRYQKN PSSLNNQEKN ILAYFINQTS 1740 

40 GGNTAWAASI LKTPQSMGNL TIPSKDINNT LSKAYQTLSR YDSFDYKSAV AAQPALYLLN 1800 

GPLGFSVKAA TVAAGGYNIG QGAKAISNGE YLHGTVQWN GTLMVAGSVS AQAAISAKPA 1860 

PVTRYLSNDS APALRQALTA ESQRIRMKLP EEYRQ I GNLA IAKIDVKGLP QRMEAFSSFQ 1920 

KGEHGFI S LP ETKIFKPISV DKYHNIASPP RGTLRNIDGE YKLLET I AQQ LGNNRNVSGR 1980 

IDLFTELKAC QSCSNVILEF RNRYPNIQLN IFTGK 2015 

45 <212> Type : PRT 

<211> Length : 2015 

SequenceName : SEQ ID 374 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

MDLIQTPNKQ FVDGDRRTPG TPVPAWWLNQ LQGELYSILN AVG I EPNKAD HAQVLSAIKT 60 

55 LAADASQVAS IDALRKYSGT GYVNVNAYHA NTTVGGGVFV ADKADKS TAD NGCTVIVSTD 120 

GTRWKRVFSG MLNLHDFGYV ASKNNALSTL NAAESAALDV WDCLGLSID TGNIYPQKNK 18 0 

YTNGKFVING KTVDVQYQPI RSGIGRFISG TGAAANLKSN EWTGAGLIVI GEGAMEQMEK 240 

CVSSIAIGDR AQGFSKVSRD NIAIGADSLI NVQAATEWYD QSRMEGTRNI GIGGNAGRGI 3 00 

TSGYSNVSIG RNAGQGLGEG SSNIALGAGA MAGTAP VGF S GDIEVFWPSS TSRTIAIGEA 360 

60 VLQTYQGRAA QTAI GANAAR NTKKAEKVTA IGSAAMENLE RNRAPNGGDV WTGTEAGTY 420 

AQSGKMITDT FPNIRGAQAT YWVGIRLTSG TAQTLQNDW PAQWSVNGN TLIIQSSKEL 4 80 

TATGAAELKY VYSVNSTATK NEELTI IGAN AMNKALTAGY STI IGVDAAL LGDNYQKTTA 540 

IGASSLRTGS HISTTAIGYW VIPLASSEKC VAIGDSAGYR NVQGDFLTGK ITNSIAIGYG 600 

ARINGDNEIQ I GTTGQTLYA PTAVNIRSDG RDKADVKPLT NGLDFVMKLK PMTGYYDRRD 660 

65 SYVDELFKDL PADERADKVR EWWANPIKDG SHKEDRLRHW FIAQDIAALE DEYGRLPMVN 720 

KTNDTYTVEY ETFIPVLTKA IQEMAARIET LETEMKESKK 760 
<212> Type : PRT 
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<211> Length : 760 

SequenceName : SEQ ID 375 
SequenceDe script ion : 

Sequence 

<213> OrganismName : Streptococcus pyogenes MGAS823 2 
<4 00> PreSequenceString = 

MKNISRKCFM TSWCIILGG ILLGAGYATG GLQDIKHQTA PKKVIKTFDQ ITALDIDSSA 60 

STITVETGPV QRPTVTYYTH PKFIDPIVTT LTGKTLSLSQ KPKDIVITGG IEILGFTLNN 12 0 

SRQEKNYRSI TITVPEKTSL NEVKGSNVPH TTLSNLTVQD MQFDGNLTLL HTKVKKAT I T 18 0 

GMLEATKSQL TNLELKADYS FSNLTDS S VE NGT I SLGMGQ LTTKDTTLKA INIQSLHPGG 24 0 

IEAERTTLEN VTFTVSKSKE EEEENDYYDN DAIFTAHALT LKGTNTISGG DIDVDITLTK 3 00 

AKAIAYRART ENGKVSLGSQ LT !PAKI GKE S TSDVISYVAE NKAATGNLTV NLNKGDI T I K . 360 

<212> Type : PRT 
<211> Length : 360 

SequenceName : SEQ ID 3 76 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString z 

MFKKENLKQR YFNFGLVALA LTIIiAIIFAF SSKNADTKSY AKKSESKMVT IDKAPKNNHA 60 
ITKEESKEKA KSIASEPIPT VENSVAPTVT EEAPWQQEV TQTVQQVSSV AYNPNNWLS 12 0 

NGNTAGIVGS QAAAQMAAAT GVPQSTWEHI IARESNGNPN AANASGASGL FQTMPGWGST 18 0 

ATVEDQVNAA LKAYSAQGLS AWGY 2 04 

<212> Type : PRT 
<211> Length : 204 

SequenceName t SEQ ID 377 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString z 
MLEELKTLIK NPKLMITMIG VALVPALYNL 
DKSDTIGNDM VDKMSKSKDL DYHFVSSKSA 
PQKLTIRYQT SKGHGMVAAK MGE TAMAKL K 
TGSQALDSGA KTAQMGSQML SDISTLAGLSSA 
QLSTDMPVYL NGVSRLSQGA SQLNQGLSQL 
LNENLSTMQV PKLNTDELGN NKAA I AQAAQ 
QGELTAALTQ TDKGEAVAPA QTXLRSVQTL 
NQALPGASSA LTELSTGLAK VUGS LNQQVL 
ANAL S S KSGE LLDGSHQLSE GATKLADGSS 
SQQLSLVSVT DKNAKAVAKP LVLNEKDKDG 
SLSGRPVKDK WDWAKQKFVI NGFISTMGSI 
MALVTALVGW DDRYGSFASL VMLLLQVGSS 
QTISLSGHIG VEVKVLTGFL LAFMVLSLLI 
<212> Type : PRT 
<211> Length : 757 

SequenceName : SEQ ID 3 78 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString z 

MSRDPTYTIN EHDLSFADGR FYVTFKADKS SETVRLNSSC LGNTIIKKLQ VEDDNTMHDF 60 

VKPKVTTQQA FGLAQQVKEL DLQLKDPKSD LWGKIKFNNK AMLVEYANKE MSSAIAQSAE 120 

QILLQVKSID DERYSKFEQT LHGI KQTVKS ESVESARTQL ASMFDSRISG LDGKYSRLSQ 180 

TIDSLSSRLD DGVGNYSTLS QPCVSGIDLRV SNAANDVSRL SQTAQGLQSQ ITNANQNYSS 240 

LSQTVQGLQT TVRDNQSNAT SRINQLSDLI STKVSKGDVE TTIAQSYDKI AFAI RDKLPA 3 00 

SKMSGSEIIS AINLDRSGVK ITGKNITLDG NSYISNAVIK DAHIANMDAG KINTGYLNAN 360 

RIATEAITGE KIKMDYAFFN KL,TANEGYFR TLFAKD I FAT SVQSVTLSAS KITGGVLAAT 420 

NGASQWDLNN ANMTFNRDAT IKTFNSKNNAL VRKDGTHTAF VHFSNATPKG YRGSALYASI 480 



SFLGSMWDPY GRVNDLPIAV 
QKGLKKGDYY MVITLPEDLS 
ESVSQNITKT YTSAVFSSMT 
SWQFQQGTNR LTSGLTAYTA 
TQSTTLSDDK AKRIQSLEVG 
QLLVKEAAAH KEQLAVLQAT 
STSLQSLSQE DQSKQLEQLK 
PGSNQLTTGL AQLNRYNTAI 
QLSQGGHQLT SGLTELSTGL 
VKTNGI GMAP YMIAVSLMW 
VLYLAIQLLG FEARYGMETL 
GGSYPIELSG AFFQKLHPFL 
YRPKKTV 



VNHDKPAKRA 60 

QRATTLLNPE 12 0 

DLQSGLKEAS 180 

GVSQVKDGLG 240 

LPVLNQGIQQ 3 00 

SAYQSLTAEQ 3 60 

EAVAQIANQS 420 

GSGVIKLSEG 480 

SILNGSLAKA 540 

AL S TNVI FAN 60 0 

GFIMLSGWTF 660 

PMTYWSGLR 720 
757 
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GITSSGDGID SASSGRFAGL RSFRYATGYN HTAAVDQTEL YGDNVL I ADD F S I NRGFKFR 540 
PDKMEKVLDM NDLYAAWAL GRCWGHLANV GWNTAHSNFT SAVSRELNNY ITKI 594 

<212> Type : PRT 
5 <211> Length : 594 

SequenceName : SEQ ID 379 
SequenceDescription : 

Sequence 
10 

<213> Organi smName : Streptococcus pyogenes MGAS8232 
<4 00> PreSequenceString : 

MAADGKVTIL VDVDGKQVKV LNSELDKVAK HGDKGSSSLK KFAVGAGVFK LASAAVDLVS 60 

QSLGKAITRF DTLEKYPRVM KAMGHSAEDV ARSTDKI'ANG IDGLPTTLDE WGTAQRLTS 12 0 

15 ITKDINKSTN LTLALNNAFL ASGASSEAAS RGLEQYAQML SAGKVDMQAW KTLQETMPYA 18 0 

LQQTAEAFGF AGASAQKDFY EALKNGQITF DQFSNKLIEL NDGVGGFAEL AKENSKGIET 240 

SFNNIKNAIA KGVANSIKAL DDL S KAATGK GIADHFDSLK WINASFSAI NASIKASTPL 3 00 

FKLLFSVIGA GISWKAIiSP ALVGVAS GLA AMRAVNETIT MIKALNRAWV MAS AS M S I GA 3 60 

TTIKTVTAVQ AVS TTMTKAD MVARLSQLGV LKASTVIYGV MTGAISLSTA ATI AS TAAVT 42 0 

20 ALKAALVALT GPVGWWGAI GALVAVGVSL WSWLTKESDE TKKLKKEQEG LVESNKQLRD 48 0 

SVREGVQERK KGLESVKEST AAHQKLADEI IKLAAKENKT AGEKQNLKNK IDQLNGSIDG 54 0 

LNLAYDKNSN SLSHNADQIK SRISAMEAES TWQTAQQNLL NIEQKRSEVS KKLAENADLR 60 0 

KKWNE EANVS DSVRKEKIAE LTEEEAKLKN MQTQLQEEYN KTSATQQAAA DAMAAAEESG 660 

SARQVIAYEN MSEAQRTAID NMRTKYSELL ETTTSIFDAI EQKTALSVDQ MNTNL EKNRA 720 

25 ATEQWATNLE ILAQRGVDQG I LEQLRRMGP EGATQTQVFV DATDAELAPL QENFRAATET 78 0 

AKNAMGSVLD SAGVEMPEKV KGMVTNVS T G LQAELQAANF AQLGQEIPNG VSQGISQGAG 840 

KASDASVKMG QEVKRSFQGE LGIHSPSRVF TEYGGHITDG LSNGVTNGTS KVMQTMQSLA 90 0 

QQMSQKGQQI VNDMRS KSNQ ITDAFSTMSG PMHSHGVNAM QGLANGIYAG SGAALAAAQS 9 60 

IAARITATIQ SALDIHSPSR VMRDEVGRFI PQGIAVGIDA DRKVTDS S MQ KLKESMTINA 1020 

30 TPEIASGFGG GVAGIANQTT NNSNNSFTLN VKVDESDGNS HEKYQRLFRE FSWYIQQQQG 1080 

RLGDVK 1086 
<212> Type t PRT 
<211> Length : 10 86 

SequenceName : SEQ ID 380 

35 SequenceDescription r 

Sequence 



<213> Organi smName z Streptococcus pyogenes MGAS8232 
40 <400> PreSequenceString : 

MAKEPWEEKI VDDTIGTRTR KSRNAFISTP WLTALLSVFF VIIVAILFIF FYTSNSGSNR 60 
QAETNGFYGA STHKKTRKAS NAKKTSSSST TTDTTPSSEE TLASSEGTGE TLTVLAGEGA 120 
ASIAARAGIS VEQLQALNPE HMTQGYWYAN PGDQVTIK 158 
<212> Type : PRT 
45 <211> Length : 158 

SequenceName : SEQ ID 3 81 
SequenceDescription : 

Sequence 
50 

<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

MSKRGKIKIT TKTKLITASV ITLVLIITGV VLWKQQQNTL TAD IAKEP YS TVSVTEGSIA 
SSTLLSGTVK ALSEEYIYFD ANKGNDATVT VKIGDQVTQG QQLVQYNTTT AQSAYDTAVR 

55 SLNKIGRQIN HLKTYGVPAV STETNKDEAT GEETTTTVQP SAQQNANYKQ QLQDLNDAYA 
DAQAEVNKAQ IALNDTWIS SVSGTWEVN NDIDPSSKNS QTLVHVATEG QLQVKGTLTE 
YDLANVKVGQ SVKIKSKVYS NQEWTGKISY VSNYPTESNA GSTTPAGSTG AGS S TGAA YD 
YKIDIISPLN QLKQGFTVSV EWNEAKQAL VPLTAVIKKD KKHYVWTYDD ATGKAKKVEV 
TLGNADAQQQ EIHKGVAVGD IVIANPDKNI KPDKKLEGVI SIGTNTKPEK DSQSKNKKSG 

60 VDK 

<212> Type : PRT 
<211> Length : 423 

SequenceName : SEQ ID 3 82 
SequenceDescription : 

65 



60 
120 
180 
240 
300 
360 
420 
'423 



Sequence 
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<213> OrganismName : Treponema pallidum 
<400> Pre Sequences t ring : 

MLRLPTARAC ITMGTMIRHT FTHRCGALLC ALA.LGS S TMA ATAAAKPKKG QMQKLRQRPV 60 
WAPTGGRYAS LDGAFTALAN DASFFKANPA GSAJSTMTHGEL AFFHTTGFGS FHAETLSYVG 120 
5 QSGNWGYGAS MRMFFPESGF DFSTTTEPVC TPASNPIKQR GAIGIINFAR RIGGLSLGAN 180 
LKAGFRDAQG LQHTSVSSDI GLQWVGNVAK SFTSEEPNLY X GLAATNL GL TVKVSDKIEN 240 
CTSTCEKCGC CKERCCCNGK KACCKDCDCN CPCQDCNDKG TVHATD TMLR AGFAYRPFSW 3 00 

FLFSLGATTS MNVQTLASSD AKSLYQNLAY SIGAMFDPFS FLSLSSSFRI NHKANMRVGV 3 60 

GAEARIARIK LNAGYRCDVS DISSGSGCTG AKASHYLSLG GAILLGRN 408 
10 <212> Type : PRT 

<211> Length : 408 

SequenceName : SEQ ID 383 

SequenceDescription : 

15 Sequence 



<213> OrganismName : Treponema pallidum 
<400> PreSequenceString : 

MSRTFRAWQC VGALCALSPL LPAYSSEGVR EVPPSQSPQV WAYEPIRPG DQLLKIGIVA 60 
20 GCQLYIAGGN GTNGSSSSGT NGNGNGKLLG GGGFHLGYEY FFTKNFSLGG QVSFECYRTT 120 

GSNYYFSVPI TVNPTYTFAV GRWRIPLSLG VGLNIQSYLS KKAPGLIAEA SAGLYYQYTP 180 

DWSIGGIVAY TQLGDIASSP DKCRAVGLAT IDKGVRYHF 219 

<212> Type : PRT 

<211> Length : 219 
25 SequenceName : SEQ ID 3 84 

SequenceDescription : 



Sequence 

30 

<213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgataaatt taagtaagga agcaacggtg 
atgatgttgt cttttcctgt agcttctcaa 

35 gtatataacg ccaatggtgt gccagtcgtt 
tctcataata tctgggataa cctaaacgtt 
gctaatgaat ccagtacttc acttgccgga 
gggtcggcga aggtgatcct gaatgaggtt 
atgatggaag ttgcagggga taaagcggat 

40 gtaaacggtg gcggttcaat caatacaggt 
atccaggatg acaagctggc cggttactcc 
ctggataacg ccagcccgac agaaattctg 
tctgccgatg agctgaacgt tgttgctggc 
accggtagcg tatccgccac ggggtcccgt 

45 ggcggaatgt atgcgaacaa aatcagtctg 
aacctcggcg ttattgctgg gggtgttaat 
ttaaacagta acgcccagat tcagtctgca 
ctggataaca ccaccggtac ggtgacatct 
aatactatcg tgaatacccg tgcgggtaac 

50 agcggtacga ttgacaatac taacggcaag 
accaataacg ccacgctgat taactctggt 
ctcgtggcgc tgaaaaccgg aacgctcaac 
gtgggtcttg aatccgctgc gctgaataac 
atcgccatta tcagtaacgg taatgtggat 

55 gggcatatcg ttattggcgc ggcaggtagc 
accggcagtt ctgactctct gggcattatt 

* aacatcaata acaacggcgg acagattgcg 

agcacgatcg acgactatgc gggcaaaatt 
agctctctgc gtaacgatac cggggggatc 

60 ggcggcagcc tgaccaataa tattggcgtg 
ttagccaact ccgtggataa ccacggcggc 
tcgatgtctg gcgtcaataa caacacagcg 
aatgcgcgcg gcagtatcga aaaccgcgat 
tacttcggca tgcctcagca aacgggtgga 

65 gggcagaaca tctataacaa caacagccgt 
caggcgcaga acacgttcga caacacgcgt 
attcaggttg gcggaacgta ttacaacaac 



coli 0157:H7 

ggcjaaagcat taacccctat tgctatactt 60 

gcggcgggat tagtcataaa aaatggaacg 120 

gacatcaaca aacctaacgg tagcggttta 180 

gataaaaatg gtgtcgtttt caataatagc 240 

aatattcagg gaaacagtaa tctgacctcc 3 00 

acttccaaaa atccttcaac cattaatggg 3 60 

ctgattattg ccaacccgaa tggtattact 420 

aaacttacct taaccaccgg gacgccggat 480 

gtgaacggcg gtaccattac gctcggtaaa 540 

tcccgtaacg tggtagttaa cggcaaagtg 600 

aataactatg ttaatgccgc aggccaggtg 660 

aacggttaca gcgtagatgt tgccaaactg 720 
gtcagcaccg agaaaggtgt gggggttcgc 780 
ggtgtcagca tcgattccaa aggtaacctg 840 
agcacgatca acctgacaac aaatggtact 900 
gtaggcacta tctcgcttaa taccaacaag 960 

atctctacga tgggcgatat ctacgttaac 1020 

cttgcggctg caggaatgct ggcggttgat 1080 

aaagggagtt ctgtcgggat tgaagcgggg 1140 

aacagcaatg gtcagattcg cggtggctat 12 00 

aacaacggtg atatccagac caccggcgat 1260 

aacaacaaag gtctgatccg ttcgtccacc 1320 

gtaaataatg gttcaaccaa aaccgccgat 13 80 

gcagataccg gcgtagaaat tggtgcgaac 1440 

tctaatggca acgtctccct gtcaagttac 1500 

ctgtccaaca gcaaagtgat tatcaaggga 1560 

agcggtaagc agggtattga agtcgccgtt 1620 

atcagctctg aagagggtga tatctccctg 1680 

ttcatgatgg ggcagaacat cacgatggag 1740 

ctgatcgtgg ccagcaaaaa actgaagata 1800 

ggcaataact tcggtaatgc ttatggtctg 1860 

atcjgtcggca aggaaggcat cgagctttcc 1920 

cttatcgctg aggatggtcc tctgactctg 1980 

gctctggtca ccagcggggc ggatgcatct 2 040 

tacgctacca cctggagtgc gggcaacctg 2100 
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gatatcgacg cgaccacgct gcaaaacagc agcagcggta cgatgatcga taacaatgcg 2160 

accgggttca tagcatctga taaaaacctg tcactggaag tggtgaatag ccttaccaac 2220 

tacggctgga tcagcggtaa aggcgatgtt gatgtcacgg tgaataacgg caacctgtat 2280 

aaccgcaata ccattgcggc tgaaaagggg ctggatafctg ccgcgttgaa cggtattgaa 2340 

5 aactggaagg atatttctgc tggcggcgac ctgacgatga acaccaatcg ccatgtgacc 2400 

aacaactcca acagcaatat ggtggggcag aatattgtta ttaacgcggt taacgatatc 2460 

aacaaccgtg gcaacattgt cagtgacgct gacctgaacg tgacgaccaa aggcaacctg 2520 

tataactatc tctatatggt agggtatggg gatatcgcat tgtcggcaaa tagcgtggcg 2 580 

aacaataacg cgaccatcga agcgacaggc gatctgatta tcgattcgaa gggtaacgtg 264 0 

10 ggtaacaacc gcggtaatct gcatgcgttg aacggcgtgt tgtctgttaa aggcaacaafc 270 0 

ctgaacaacg ataacggtga aattcgtggt tatggcgatg tcacgctggc actgacgggc 2760 

aactacgaca gctataaggg ttcgctgacc tctgaaacgg gcgacgtgac tctgacggcg 282 0 

aacattgtag acaacgccta tggtttgatt gccggtgaga atgtttctgt cgatgctaaa 2 880 

tcgacgattt acaacaacac tgcgctgatc gcggcgaatp, aa^agctggt tattaacgct ... 2940 

15 ggcggcaacc tcgaaaaccg cgacgggaat aacttcctgc gtaataacgg cgcgctgttt 3 0 00 

ggaattaccg acaacgttgg cggcatcgta ggfcaaagaag gtgtcacgct ttctgctcag 3 060 

aacgtctaca acaataacag cagcatcatc gctgaaaatg gtccgcttaa tcfcgctgtcc 312 0 

aggggaacgc tggataatac ccgcgcgctt cttagcagtg gggctgatgc catcatccgt 318 0 

gcggcaggga cgttctacaa caactatgcc accacgtaca gcgccggtaa tctcgacgtt 3 240 

20 tatgcggcgt cgttgaacaa cgccagogat ggtcgcctgg aagacaatac cgccacgggc 33 0 0 

gtgattgcgt ctgacaaaaa cctggafcctg agcgttgata acagtgtcac taactatggt 3360 

tggatcagcg gtaaaggaga tgtgcatttc aatgttctga aaggcacgct gtataaccgt 342 0 

aatgccatcg cggcggacaa cgcgctcgacc attaatgccc tgaacggtgt tgagaacttt 3480 

aaagacattg tggcgggtac tgcgctgact attgatacgc agaagtatgt taccaacaac 3540 

25 agcaacagta atatgttggg acaaaccatc gcgatcaatg ccgtgaatga cattaataac 3 60 0 

cgtggaaata ttgtgggtga ttattctctg ggtgttaaaa ccaccggtaa tatttataac 3660 

tacctcaata tgctgagtta tggtgtcgct ggcgtatcgg caaataaggt tacgaatagc 3720 

ggtaaagacg ctgttctcgg tggcttctac ggtttagcgt tagaagcaaa cgaaactgat 3780 

aacaccggta ctattgtcgg catgtaa. 3 8 07 

30 <212> Type : DNA 

<211> Length : 3807 

SeguenceName r SEQ ID 385 
SequenceDescription : 

35 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

gtgaacacaa tacacttgcg ctgtctcttc aggatgaatc ccctggtctg gtgcctgtgg 60 

40 gctgatgttg cagcaaagct aaggtcgctt aaacgctact cagtattcac ttttcagagg 120 

atgaaattta tgaacaggac cagtccctat tattgtcgtc gctcagtact ttccttattg 180 

- atatctgcct tgatatatgc cccgcccggg atggctgcct tcactcctga tgttattggt 240 

gtggtaaacg atgagactgt agatgg-cagc caacgagtag atgaacgagg tacaacaaat 3 00 

aacactcata ttatcaacca tggccagcag aatgtttatg gcggggtatc taatggaagt 360 

45 cttattgaat ctggtggata tcaagatgta ggaaggcata acaattatgt ggggcagtct 420 

aataatacca ccattaacgg gggcagacag tcaattcatg acgggggtat ttccacaggt 4 80 

acgataatcg agagtggcaa tcaggacgtt tataaagggg gtatcagcaa tggaacgaca 540 

attaagggcg gtgcttcacg cgtagaggga gggagtgcga atggaacact cattgatggt 600 

ggtagccaga tagtaaaagt tcaagggcat gctgatggta caacgataaa taagtctggc 660 

50 tctcaggacg tagtacaagg aagtctggca acgaacacaa ccataaatgg tggtcgacag 72 0 

tatgttgaac agagcacagt agaaacaacc accatcaaaa atggcggtga gcaaagagta 78 0 

tatgagagcc gtgcgctgga cacgacgatt gaaggcggaa ctcagtctct gaatagtaag 84 0 

tcaacggcaa aaaatactca gatctattct ggtggtacgc aaattattga taacaccagc 90 0 

tcctcggatg ttattgaagt ttattccggt ggcgtgcttg atgttagtgg tggtacggca 960 

55 acaaatgtta cccagcacga tggtgcaatt ttaaaaacta acactaacgg tacgacggtg 102 0 

agcggtacga atagtgaagg tgcattctcc atccacaatc acgtggcaga caatgtgttg 108 0 

ctggaaaacg gtggtcattt agacataaac gcatatggtt cggcaaacaa gacgattatt 114 0 

aaagataaag gaacaatgtc agttttaacc aatgctaaag ctgatgcgac ccgaatagat 12 0 0 

aatggcgggg ttatggatgt tgcaggaaac gcgacaaata ccataattaa tggtggcaca 12 60 

60 cagaatatta ataattatgg catagccaca ggcaccaata tcaacagcgg aacgcaaaat 1320 

atcaaaagcg gcgggaaagc tgacacaaca attatatcct ccgggagccg gcaggttgtt 13 80 

gagaaagatg gtacggcaat tggcagcaat attagcgccg gaggctcgct gattgtctat 1440 

accggcggta ttgcacatgg ggttaaccag gagacgggca gtgctttagt tgccaacacg 15 00 

ggtgcaggga ctga^atcga aggatacaac aagctctctc acttcactat taccggaggg 15 60 

65 gaggctaatt atgttgtgct ggaaaatacc ggcgaactga cggtagtggc taaaacctcg 1620 

gcgaaaaata ctaccattga tgctggcggt aagctgattg tccagaagga ggctaaaaca 1680 

gatagcacca gacttaataa tggcggcgtt ctggaggttc aggacggtgg tgaggctaag 1740 
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40 
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55 



60 



65 



catgttgagc aacaatccgg cggcgcatta 
gaaggaacca acagttatgg tgatgctttc 
gtgctggaaa acgctggctc attaacagtc 
attaatgcca acggcaaaat ggatgtttat 
gctggcaccc aaacaatata tgccagtgcc 
aagcaaacgg tatatggttt agccactgaa 
gatggtgggt caacagagaa aacacacatc 
ggtaaggcga tcaataccga tatcgtctct 
gcggaaggtt ccattattaa tggcggttca 
aactcggtgc ttaatgatgg cggcacactc 
atacagcaga gtagccaggg cgcgttggtt 
acacgcgcgg atggcgtcgc gttcagcatc 
gcaaatggcg gagtgttaac cgtggagtca 
acgggcggac gggagatcgt caaaacaaaa 
ggtgaacaaa ttgtcgaggg tgtggcgaat 
acagtttcag ctaacggaga ggcaataaaa 
gtcaacgata atggcaaagc gacagatatc 
agcacggcta acggtattga aatcagcggt 
ggcaatttag cgaccaatat gttgctggaa 
accgaagctc gcgactccac ggttggcaag 
tccgccacaa aggttaactc tggtgggcaa 
caggctctgg cccgggcaga agatctccag 
ggtacgctgg cggatgcatc ggtcagtggc 
cgggataatg ttacgccagt taaactcgaa 
ttaactatcg gcaatggcgt tgatacgacg 
agtgtctggc ttaacagcaa taattcctgt 
aacagtttgc tacttaacga cggtaatgtt 
acaactaacg gtatatacaa tacgctgaca 
tacctgcata ccaacgttgc aggctctcgg 
actggtaatt ttaaaatctt tgttcaggat 
atgacgctgg tgaaaacagg gggaggggat 
ttcgttgatc ttgggaccta tgagtatgtc 
ctgaccaatg atgtcaaacc caacccggat 
ccggatccaa aaccagaccc aaaaccggat 
ccgacacccg ttccggagaa acgcatcacg 
gcaacattac cgttggtatt tgatgctgag 
atgaaagcga gtccacacaa caataatgtc 
gtcaccaccg afcgcgggggc cgggtttgag 
gacagcccta atgatattcc tgaggggatt 
cattcacata tcggttttga tcgcggagga 
ggctatgcca gttgggaaca tgaaagtggt 
cgttttgaaa gtaacgtagc cggtaaaatg 
cacagcaacg ggctgggcgg tcacattgaa 
aacctgacgc cgtatgcatc gttaacgggg 
tccaatggca tggaatcgaa atcagtcgat 
acgctgagtt acaacatgcg tctggggaac 
gctgtgcgca aagaatttgt cgatgataac 
gtcaatgafct tgtcgggcag acgtggaata 
agtacgttaa gcgggcatct tggggtgggg 
tggaacgcgg tagctggtgt gaactggtcg 
<212> Type : DNA 
<211> Length : 4716 

SequenceName : SEQ ID 3 86 
SequenceDe script ion : 

Sequence 



attgcttcca 
tacatcagga 
gtcactggtt 
ggaaaagatg 
acttctgata 
gcaaatatcg 
aatggtggca 
ggcctacaac 
cagatagtta 
gatgtgcggg 
gcaaccacca 
gagcagggtg 
gacacctctt 
gccactgcga 
gagacaacaa 
acaacgatca 
gtccagaaca 
actcaccagt 
aatggcggta 

gggggggcaa 

tatacccttg 
gttgctggcg 
gcgacaggaa 
ggggcgatcc 
cttgccgacc 
gcaggcacca 
tatttatcag 
accaatgaac 
ggcgatcaac 
accggcgtca 
gcttcgtttt 
ctgaaaagcg 
cccaacccaa 
ccgaaaccag 
ccttctaccg 
ctaaacagta 
tggggggcga 
cagacgctga 
gcgacgctgg 
catggcagtg: 
ttctatctgg 
agcagcggtg 
accgggatgc 
ttcaccgctg 
acccgcagta 
ggtatggaaa 
cgggtgaagg 
taccaggcag 
tatagccatg 
ttctga 



cgacctccgg 
attcagaagc 
cccgggcagt 
ttggcactgt 
aagcaaatat 
aaagtggtga 
cgcaaaccgt 
aaattatggc 
atgagggcgg 
agaaaggcag 
gggcgacgcg 
cggcgaacaa 
ctgacaaaac 
c^ggcacgac 
ttaacgacgg 
atgaaggcgg 
gcggtgccgc 
acggcacttt 
atttattggt 
tgcaaaacca 
ggcggtcaaa 
ggacagcaat 
gcctgtcgtt 
ggattaccga 
tgacggctgc 
gcaactgcga 
cacaaacagc 
tttccggtag 
tggtcgtcaa 
gtcctcagtc 
cgctgggcaa 
atggcaacag 
atcccaaccc 
acccgactcc 
cagccgtact 
ttcgcgagcg 
cgtataacac 
ccggaatgac 
gcgcttttat 
tgggcagtta 
acggtgtcgt 
gagccgccaa 
gatttaccga 
ataaccccga 
tatatcgtga 
ttgagccgtg 
tgaataatga 
gtattaaagc 
gtgccggtgt 



aacacttatc 
taaaaatgta 
tgacacgatt 
actcaatagt 
caaaggtggc 
acaaattgtt 
tcagaattat 
aaacgggaca 
tctggctgaa 
cgcaacgggg 
ggtcacagga 
tatcctgctg 
acaggtcaat 
•gctcaccggc 
cggaatacaa 
tacgctgaca 
tctccagacg 
ttccatttcc 
attagcaggt 
gggtcaggac 
agatgagttt 
cgtctacgca 
aatgacgcca 
tagcgcgaca 
cagccggggc 
gtatagagta 
agcgcctgcc 
cggtaatttc 
caacaacgcc 
tgacgacgcg 
tactggcggt 
caactggaac 
aaatccgaag 
cgagccaacg 
caatatggca 
gttgaacata 
ccgtaataat 
agtggggatc 
gggttattcc 
ttctctgggc 
gaagctgaac 
tggcagttac 
tggtaactgg 
atatcattta 
actgggcgca 
gctgaaggcg 
cggtaatttc 
ctcattcagc 
ggaatccccg 



1800 

I860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4716 



<213> OrganismName : Escherichia 
<4 00> PreSequenceString : 
atggcagtaa agatttcagg tgtactgaaa 
accattcaac tgaaagccag acgtaacagc 
gaaaatccgg atgaagccgg tcgttacagc 
attctgttgg tggaagggtt cccgccgtca 
tctcaacccg gtacgctgaa tgattttctt 
gaggcactgc gccgttttga gctgatggtg 
gcacagaaca cggcagccgc gaagaagtca 
gcggcaaccc atgcgactga tgctgcggac 



coli 0157 --H7 

gacggcacag gaaaaccggt 
gccacggtgg tggtgaacac 
atggacgttg agtacggtca 
catgccggga. ccatcaccgt 
ggtgccatga ctgaggatga 
gaagaggtgg cgcgtaacgc 
gccagcgatg ccagcacatc 
tcagcacgccj cagccagcac 



agagaactgc 
ggtggcctct 
gtacagcgtt 
gtatgaagat 
tgtccgtccg 
gtccgcggtg 
age c eg t gag 
gtcagccgga 



60 
120 
180 
240 
300 
360 
420 
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caggccgcgt cgtcggctca gtcagcgtct tccagcgcag gaacggcatc aacaaaggct 540 

actgaagcat caaaaagtgc tgccgctgca gagtcctcaa aaagcgcggc ggctaccagt 600 

gccggtgcgg cgaaaacgtc agaaacgaat gcggcagtgt cacaacaatc agccgccact 660 

tctgcatcca ccgcgaccac gaaagcgtca gaagctgcct cctcagccag ggatgcgtcg 720 

5 gcttcaaaag aggcggcaaa atcatcagaa acgagcgcag cctcgacgcgc cagtagtgca 78 0 

gcctcctcgg caacggcggc aggcaattcc gcgaaggcgg ccaaaacgtc tgagacaaac 840 

gctaagtcct ctgaaacggc agcagaacag agtgcctccg cagcagcagg ctcaaaaaca 90 0 

gcggctgcat tatctgccag tgccgcgtca acaagtgccg ggcaggcctc agccagtgcc 960 

accgccgccg gaaaatcggc agaaagtgcc gcatcgtctg cttcaacagc cacaacgaag 1020 

10 gctggcgaag ccactgaaca ggccagcgca gcagcgagtt ctgcttccgc agcgaagaca 10 8 0 

tccgaaacga acgcgaaagc gtcggaaacc agcgcagaat cctcaaaaac ggctgccgca 1140 

tcgtcagcca gttcggcggc gtcatcggca tcatctgcgt ctgcttcaaa agatgaggcg 12 00 

accagacaag cgtcagcagc gaagagcagc gccacgacgg catccacgaa ggcgacagag 1260 

gcagctggta gtgcgacggc agcagctcag agcaaaagta cggcggaatc tgcagcaacg .JJ3 20 

15 cgcgctgaga cagcggcaaa acgggcagag gatattgcat ccgccgtggc gcttgaggat 13 8 0 

gcgagcacga cgaaaaaggg gatagtacag ctcagcagtg cgactaacag cacttccgag 1440 

tcactggcgg caacgccaaa agccgttaag gccgcgtatg agctggctaa cgggaaatac 150 0 

accgcacagg atgcaacgac agcacagaaa gggatagttc agcttagcaa cgcgaccaac 1560 

agcacatctg aaatgctggc ggcaacgcca aagtcggtaa aggcagccta tgaccttgct 162 0 

20 aacgggaaat atactgctca ggacgctacg acagcacaaa aaggaattgt ccagctcagt 1680 

agtgcaacca acagcgcatc tgaaacgctt gccgcgacac cgaaagcagt gaaagcagct 1740 

aatgataatg cgaatggtcg ggtaccttct gcccgtaagg tgaatg-gtaa ggcgctttca 18 00 

tcggatataa cactgacgcc gaaagatatt ggtacgctta actcaacaac aatgtcattc 1860 

agcggtggtg ctggttggtt caaattagca acggtaacca tgccacaggc gagttctgtt 192 0 

25 gtttcaatta cgttgattgg tggcgcggga tttaacgtgg ggtcacctca acaggcaggt 198 0 

atatctgaac ttgttttgcg tgcaggtaat ggtaatccga aggggattac tggtgcttta 2040 

tggcagcgca catcgacagg gtttacaaat tttgcctggg tcaatacatc tggtgatact 210 0 

tacgatattt acgfctgcaat cggaaattat gcgactggtg taaatattca atgggattat 2160 

accagtaatg ccagcgtgac gattcatacg tcaccagcat attctg-ctaa taagccggaa 2220 

30 gggttaacgg acggtacagt ttattcactc tatacgccat cagagca.gtt ttatccgcct 22 80 

ggcgcaccaa tcccgtggcc atcagatacc gttccgtctg gctatgccct gatgcagggg 234 0 

cagacttttg acaaatctgc atacccgaaa cttgcagccg cttatcrcgtc aggcgtgatc 2400 

cctgatatgc gtggctggac gattaagggc aaacctgcca gtggtcgggc cgtattgtct 2460 

caggaacagg acggcattaa atcgcacacc cacagcgcca gcgcatccag fcacggatttg 2520 

35 gggacgaaaa ccacatcgtc gtttgattac ggcactaaat ccacgaiataa caccggggcg 2580 

cacacgcaca gtgtgagcgg tacagccgca agtgccggaa accatactca tagfcgtcaca 2640 

ggcgcatcag cagtcagcca gtggtcacaa aatgggtcag tacataaggt agtgtctgcg 2700 

gccagtgtga atacaagtgc tgcaggagcg cacactcata gtgfcca.gcgg cacagctgca 2760 

tctgcaggtg ctcacgcaca tactgtcggt attggtgctc atacgcactc tgttgcgatt 282 0 

40 ggctcacatg gacacaccat caccgttaac gctgcgggta acgcggaaaa cactgtcaaa 28 8 0 

aacatcgcat ttaactacat tgtgaggctt gcataa 2916 
<212> Type : DNA 
<211> Length : 2916 

SequenceName : SEQ ID 387 

45 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

50 <400> PreSequenceString : 

atgaaacgag ttattaccct gtttgctgta ctgctgatgg gctggtcggt aaatgcctgg 60 

tcattcgcct gtaaaaccgc caatggtacc gctatcccta ttggccjgtgg cagcgctaat 120 

gtttatgtaa accttgcgcc tgccgtgaat gtggggcaaa acctggtcgt agatctttcg 180 

acgcaaatct tttgccataa cgattatccg gaaaccatta cagactatgt cacactgcaa 240 

55 cgaggctcgg cttatggcgg cgtgttatct aatttttccg ggacccjtaaa atatagtggc 3 00 

agtagctatc catttccgac caccagcgaa acgccgcggg ttgttfcataa ttcgagaacg 3 60 

gataagccgt ggccggtggc gctttatttg acgcctgtga gcagtg-cggg cggggtggcg 420 

attaaagctg gctcattaat tgccgtgctt attttgcgac agaccaaaaa ctataacagc 48 0 

gatgatttcc agtttgtgtg gaatatttac gccaataatg atgtggtagt gcctactggc 540 

60 ggctgcgatg tttctgctcg tgatgtcacc gttactctgc cggactaccc tggttcagtg 600 

ccaattcctc ttaccgttta ttgtgcgaaa agccaaaacc tggggtatta cctctccggc 660 

acaaccgcag atgcgggcaa ctcgattttc accaataccg cgtcgttttc accagcgcag 720 

ggcgtcggcg tacagttgac gcgcaacggt acgattattc cagcgaataa cacggtatcg 780 

ttaggagcag taggaacttc ggcggtaagt ctgggattaa cggcaaatta cgcacgtacc 840 

65 ggcgggcagg tgactgcagg gaatgtgcaa tcgattattg gcgtgacttt tgtttatcaa 90 0 

taa 903 
<212> Type : DNA 
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<211> Length : 903 

SequenceName : SEQ ID 3 88 
SequenceDe script ion. : 

Sequence 



10 



15 



20 



25 



<213> OrganismName : Escherichia 
<4 00> PreSequenceString : 
atgggctacg ttacaggtgg attaccaatg 
ggtctgatat tgttcagcgg aacggcccca 
ttgcttggta aatcctgtac tcctgtaatc 
cccacaattg ctgccagcga tttaatgcaa 
tttcagttga aagattgcaa aagcaccacg 
acagaagata ccgacttacc aggatttctg 
gttgggattg gcattgaaac tgccggaggg 
gcctcatttc cattaaatca gggaaataac 
gtaaatggac gaaatgttac atcgggtgat 
tatttttaa 
<212> Type : DNA 
<211> Length : 549 

SequenceName : SEQ ID 389 
SequenceDescription : 

Sequence 



coli 0157 :H7 

aagaataacc gtgcgtgggc 
gctgccgata acctgcattt 
aatggcaact tacttgcaga 
cgtggtcagt cagatcgcgt 
gcgtttaatg tcaaggtgac 
tcgattgatt cgtcatcttc 
gcggctgtac ctattaacag 
agtgtcaatt ttaatgcctg 
ttcaccgcca caatgacggt 



gcttatcagt 
taccggtaat 
aattcatttc 
accgttagtt 
cttgatggga 
tgcaacgggt 
taccacaggt 
gttacagacc 
aacttttgag 



<213> OrganismName : Escherichia 
<400> PreSequenceString : 
atgaaacgac atctgaacac cagctacagg 
gtggtggcct ccgaactggc ccgctcacgg 

30 tctcttgctg ctgtcacatc agtcccggca 
gaaaccgtga acgatggaac actgacaaat 
aacggaatga ccatcagtac cgggctggaa 
gggcaatgga tacagaatgg cgggatagcc 
caggtcgtgc tggagggggg aacagccagt 

35 agcctgaacg gactggcggt gaacaccaca 
gagggcgggg ttgccaccgg tacaattatc 
ggcgggctgg caacaggaac catcatcaac 
aactcgtata cgggtcagaa ggtccaggga 
ggacggcaga ttatcttatt ttccgggcta 

40 gaccagtcgg tacacggaag ggccctgaat 
cacagggacg gacttgcgct gaacacggta 
gcaggtggcg ctgccggtaa caccaccata 
ggcggggaag ccactgcagt cacccagaac 
gcaactgtca tcggcacaaa ccgtctgggg 

45 ggtgttgttc tggaatccgg cggtcgtctg 
accctagtgg atgacggcgg taccctggca 
accataacat ccggtggtgc cctgattgca 
gccagcggta agttcagtat tgatggcaca 
aatggcggca gctttacggt taatgccggg 

50 cgtggaacac tgacgctggc tgccggggga 
ggcgccagta tggtactgaa tggtgatgtg 
gagattcgct ttgataatca gacgacaccg 
agtaactccc cggtaacgtt ccataaactg 
accatcaata tgcgtgttcg ccttgatggc 

55 ggtggtcagg caaccggcaa aacctggctt 
ggggtggcaa ccaccggaca gggtatccgg 
gaagaaggtg cgtttgccct gagtcgcccg 
aaccgtgaca gcgatgaaga ctggtacctg 
cccctgtata catccatgtt gacacaggca 

60 cgcagccatc agaccggtgt aaacggtgaa 
ggtcatctcg gtcacgataa caacggcggt 
ggcagctatg gcttcgtccg tctggagggt 
tctctgacga caggggtgta tggtgctgca 
gacggttccc gcgccggcac ggtccgggat 

65 ctggtacaca catcctccgg cctgtgggct 
atgaaagcgt catcggacaa taacgacttc 
ctggaaaccg gtctgccctt cagtatcact 



coli 0157 :H7 



ctggtatgga 
ggaaaacgcg 
ctggctgctg 
catgacaacc 
ctggggccgg 
ggaaacacca 
gatacggtta 
ctgaataaca 
aaccgcgacg 
accggcgcag 
acagcagaat 
gcccgtgaca 
accacactga 
attaacgagg- 
aatcagaacg 
acgggcggtg 
aatttcacgg 
gatgtactgg 
gtgtctgccg 
gacagtggtg 
tccggtcagg 
ggacaggctg 
agtctgagtg 
gtcagtaccg 
aatgccgcgc 
accaccacga 
agcaatgcct 
gcgtttacaa 
gttgtggatg 
cttcaggccg 
cgcagtgaaa 
atggactatg 
aataacagcg 
attgcccgtg 
gacctgctca 
ggccattctt 
gatgccggca 
gacattgtgg 
cgcgcccggg 
gacaatctga 



atcacattac 
ccggtgtggc 
acaaggttgt 
agattgtctt 
acagtgaaga 
ctgtcaccac 
ttcgtgacgg 
gaggcgagca 
gttaccagag 
aaggcggccc 
ccaccaccat 
ctctcattta 
atggcggtta 

ggggctggca 

gtgaactgag 
cactggttac 
tggaaaacgg 
agagccattc 
gcggtaaggc 
ccactgttga 
ccagcggcct 
gcaacaccac 
gcagaacaca 
gcgatattgt 
tgagccgtgc 
acctcaccgg 
ctgaccagct 
atgtcggaaa 
cacagaatgg 
gcgcctttaa 
atgcttatcg 
accggattct 
tccgtctcag 
gagccacgcc 
gaacagaggt 
ccgttgatgt 
gtctgggcgg 
cccagggaac 
gctggggctg 
tgctggagcc 



gggcaccctg 
ggttgcgctg 
acaggcggga 
cggtacggcc 
aaacaccggt 
aaatggtcgt 

cgggggacag 
gtgggtgcat 

cgttaaaagt 
tgattctgac 
caacaaaaat 
cgcaggtggt 
ccaatatgtg 
ggttgttaag 
ggtacatgcc 
cagtactgct 
taaggctgac 
agcacagaat 
gacaagtgtc 
ggggaccaat 
gctgctggaa 
tgtcggacat 
gctcagtaaa 
taacgcaggg 
tgttgcaaaa 
ccagggcggc 
ggtgattaat 
cagcaacctc 
cgccaccaca 
ctacaccctg 
tgctgaagtc 
ggcaggctcc 
cattcagggc 
ggaaagcagc 
tgccggtatg 
taaggatgat 
atacctgaat 
ccgtcacagc 
gctgggctca 
acaactgcag 



60 
120 
180 
240 
300 
3 6£> 
42 0 
480 
540 
549 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
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tacacctggc agggactctc cctggatgac ggccaggata acgccggtta tgtcgaagttc 2460 

gggcatggca gtgcacaaca tgtgcgtgcc ggtttccgtc tgggcagcca caacgatatg 252 0 

acctttggtg aaggcacctc atcccgtgac accctgcgcg acagtgcaaa acacagtgtg 2580 

agtgaactgc cggtgaactg gtgggtacag ccttctgtta tccgcacctt cag-ctcccgg 2640 

5 ggtgacatga gcatggggac agccgcagcc ggcagtaaca tgacgttctc accgtcccgg 27 0 0 

aatggcacgt cactggacct gcaggccgga ctggaagccc gtatccggga aaatatcacc 2760 

ctgggcgttc aggccggtta tgcccacagc gtcagcggca gcagcgctga agg-ctataac 282 0 

ggtcaggcta cgctgaatat gactttctga 2 8 50 
<212> Type : DNA 
10 <211> Length : 2850 

SequenceName : SEQ ID 390 

SequenceDescription : 



Sequence 
15 

<213> OrganistnName : Escherichia 
<40 0> PreSequenc eSt ring : 
atgaaaaaat ggcattatat attttgcata 
tatgcggcaa atgatggcac gtgtgcaaca 

20 tttcctctga caacggtcag tgcagcaaac 
gctaatgcaa catcttctga aaattatagc 
aatggcgctt atcacgaaat atattatacc 
accaccgcaa gtggtcttgc tttttactat 
atatctgtgc taaatgcggg gtatacggca 

25 actacaacag atcacacttg tcagggaaac 
actggagcag atgcgaagat ttcatttcgt 
atacctatca ccgatattgc attgctgtat 
gaggcgattg caaaagttcg aatttcaggc 
aatgcaggac aggtgattta ttttgatttt 

30 accgccgggc aagccattac ttcacgaaaa 
gggatggggt atgagcgtac gcagaaagtc 
agtgacgata cgatggtggc gacagacaat 
tcgaatgctg aagttagcgt caacaacggc 
atttttggtc gtaaaaatgg ttcggtaact 

35 gcccggcctc agcccggcgt ttttaacgct 
taa 

<212> Type : DNA 
<211> Length : 1083 

SequenceName t SEQ ID 391 
40 SequenceDescription : 



coli 0157 :H7 

attctctttc atttagggtt accgtgcggg 60 

agaggcggca cacatacatt aagccttaat 120 

aatgtgcctg gaaatacatt aatagatatt 180 

gttctgtgta actgtgattc aaaacatagc 240 

gcagaccctg ctcccggtat ggtttatagc 3 00 

cttaacgaat atgtcgatgt gggaacaaaa 360 

gttccttttg aacatgtttc caaccaggca 420 

aaaactacag cggttggcgt gagcctgaaa 480 

attaaacgtt caataaatgg aacggtagta 540 

gccaacatat ccagcaccac gacccgtggt 600 

agtttgaccg caccacagtc ttgjtcagata 660 

gatactattc ctgcgtccga attttcatct 72 0 

atcactaaaa cagtgagtat tgagtgtacg 780 

gatgcttctt ttacggggac gaaccgaagc 840 

gctgatgtcg ggatcaaaat ttacaataaa 900 

aagttacccg cagacatggg caacacgacc 960 

ttttcggcag cacctgccag ctttaccggt 1020 

accgcgacct taaccattga attntgtaaac 108 0 

10 83 



Sequence 



<213> OrganistnName : Escherichia 

45 <400> PreSequenceString : 

atgtcacgtt ataaaacagg tcataaacaa 
tgcgtggcgt gggcaaatat ctctgttcag 
ccagtaatgg cggcacgtgc gcagcatgcg 
acggtaactg ctgataataa cgtggagaaa 

50 acatttttaa gcagtcagcc agatagcgat 
accgctaaag ctaaccagga aatacaggag 
aaactgaatg tcgataaaga tttctcgctg 
atttatgata cgccgacaaa tatgttgttc 
cgtactcagt caaatattgg ttttggctgg 

55 ggggtgaaca cctttatcga ccatgattta 
gcggaatact ggcgcgatta tctgaaactg 
tggaaaaaat cgccggatat tgaggattat 
cgcgcagagg gctatttacc tgcctggccg 
tattatggcg atgaagtcgg gctgtttggt 

60 atttctgccg aggtgaccta tacgccagtg 
cagggcaaga gcggtgagaa tgacactcgc 
gaacctttgg cgaaacaact cgatacggat 
agccgctatg acctggttga gcgtaataac 
gtgatccgta ttgctctgcc tgagcgtatt 

65 gggcttgtgg tcagcaaagc aactcacgga 
ttactggctg aaggtggcaa aattaccggt 
gcttatcgtc caggcaaaga caattattat 



coli 0157 :H7 

ccacgatttc gttattcagt tctggcccgc 60 

gttctttttc cactcgctgt cacctttacc 120 

gttcagccac ggttgagcat gggaaatact 180 

aatgtcgcgt cgtttgccgc aaatgccggg 240 

gcgacacgta attttattac cggaatggcc 3 00 

tggctcggga aatatggtac agcgcgcgtc 3 60 

aaggattctt cgctggaaat gctttatccg 420 

actcaggggg caatacatcg tacagacgat 480 
cgtcattttt caggaaatga ctggatggcg 540 
tcccgtagtc atacccgcat tggtgttggt 600 
agcgccaatg gttatattcg ggcttctggc 660 
caggaacgcc cggcgaatgg ttgggatatc 72 0 

cagcttggcg caagcctgat gtstgaacag 780 
aaagataagc gccagaaaga cccgcatgct 840 
cctcttctga cactgagcgc cgggcataag 900 
tttggcctgg aagttaacta ccgaattggc 960 

agcattcgcg agcgtcgggt actggcaggc 1020 

aacatcgttc ttgagtaccg caaatctgaa 10 80 

gaaggtaagg gcggtcagac actttccctg 1140 

ctgaaaaatg tgcagtggga agcgccgtca 12 00 

cagggtagtc agtggcaagt aacgctcccg 12 60 

gcgatttcag cagttgccta cgataacaaa 1320 
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50 



55 



ggcaatacct caaaacgcgt gcagacagag gtggtcatta ccggagctgg tatgagcgcc 138 0 

gatcgcacgg cgttaacgct tgacggtcag agccgtattc aaatgcttgc taacggtaat 1440 

gagcaaaaac cgctggtgct gtctctgcgc gacgccgagg gccagccagt cacgggcatg 1500 

aaagatcaga tcaagactga actaactttc aaaccggctg gaaatattgt gactcgttcc 1560 

5 ctgaaggcca ctaaatcaca ggcaaagcca acactgggtg agttcaccga aactgaagca 1620 

ggggtgtatc agtctgtctt tactaccgga acgcagtcag gtgaggcaac gattactgtt 1680 

agcgttgatg gcatgagcaa aaccgtcact gcagaactgc gggccacgat gatggatgtg 1740 

gcaaactcca ccctgagcgc taacgagccg tcaggtgacg tggttgctga tggtcagcaa 18 00 

gcctatacgt tgacgttgac tgcggtggac tccgagggta atccggtgac gggagaagcc 18 60 

10 agccgcttgc gatttgttcc gcaagacact aatggtgtaa ccgttggtgc catttcggaa 192 0 

ataaaaccag gcgtttacag cgccgcggtt tcttcgaccc gtgccggaaa cgttgttgtg 1980 

cgtgctttca gcgagcagta tcagctgggc acattacaac aaacgctgaa gtttgttgcc 2040 

gggccgcttg atgcagcaca ttcgtccatc accctgaatc ctgataaacc ggtggttggg 2100 

gggacagtta cggcaatctg gacggtaaaa gatgcctatg acaaccctgt gaccagcctc 2160,„ 

15 acgccggaag cgccgtcatt agcgggtgcc gctgctgaag gttctacggc atcgggctgg 2220 

acaaataatg gtgatgggac gtggactgcg cagattactc tcggctctac ggcgggtgaa 228 0 

ttagaagtta tgccgaagct aaatggacag aatgcggcag caaatgcggc aaaagtaacc 2340 

gtggtggctg atgcgttatc ttcaaaccag tcgaaagtct ctgtcgcaga agatcacgta 240 0 

aaagccggcg aaagcacaac cgtgacgctg gtggcgaaag atgcgcatgg caacgctatc 2460 

20 agtggtcttg cgttgtcggc aagtttgacg gggaccgcct ctgaaggggc gaccgtttcc 2520 

agttggaccg aaaaaggtaa cggttcctat gttgctacgt tgactacagg tggaaagacg 2580 

ggcgagcttc gcgtcatgcc tctcttcaac ggccagccag cagccaccga agccgcgcag 2 640 

ttgacggtca tcgccggaga gatgtcatca gcgaactcta cgcttgttgc ggacaataag 27 00 

gctccgaccg tcaaaacgac gacggaactc accttcaccg tgaaggatgc gtacgggaac 2760 

25 ccggtcaccg ggctgaagcc agatgcacca gtgtttagcg gtgccgccag cacggggagt 282 0 

gagcgtcctt cagcaggaaa ctggacagag aaaggtaatg gggtctacgt gtcgacctta 2880 

acgctgggat ctgccgcggg tcagttgtct gtgatgccgc gagtgaacgg ccaaaatgcc 2940 

gttgctcagc cactggtgct gaacgttgca ggtgacgcat ctaaggctga gattcgtgat 3 000 

atgacagtga aggttaataa ccaactggct aatggacagt ctgctaacca. gataaccctg 3 060 

30 accgttgtgg acacctatgg taacccgttg caggggcagg aagttacgct gactttaccg 3120 

cagggtgtga ccagcaagac ggggaataca gtaacaacta atgcggcagg- taaagcggac 3180 

attgagctta tgtcaacggt tgcgggagaa cacaatattt ccgcttcggt gaatggtgct 3240 

cagaagacgg tcacggtgaa attcaacgcg gatgccagca ccggtcaggc aaacctgcag 33 0 0 

gtagacgccg ctgctcaaaa agtggcaaac ggcaaagatg cctttacgct gacggcgaac 3360 

35 gfctgaggata aaaatggtaa ccctgttcca gggagcctgg tgacctttaa tctgccccgg 3420 

ggtgtcaagc cgcttacagg cgataatgtc tgggtgaaag ccaacgatga ggggaaagca 34 8 0 

gagttgcagg tggtttcagt gactgccgga acgtatgaga tcacggcatc ggcagggaat 3 540 

agccagcctt cgaatacgca gactataacg tttgtagccg ataaggctac cgcaaccgtc 3600 

tccggtattg aggtgattgg caactatgca ctggcggacg gcaatgccaa acagacgtat 3660 

40 aaagttacgg tgactgatgc caataacaac ctgttgaaag atagcgaagt gacgctgact 3720 

gccagcccgg caaatttagt tctgactccc aatgggacgg cgaaaactaa tgagcaagga 378 0 

caggctattt tcaccgccac gaccactgtc gcagcgaaat atacactcac ggcgaaagtg 3840 

agtcaggccg acggtcagga atcgacgaaa actgccgaat ctaaattcgt cgcggatgat 3900 

acaaatgcag tactcaccgc atcatctgat gtgacttctc tggtggcgga. tgggatatcg 3960 

45 actgcgaagc tggaggtgac actgatgtcg gcaaataacc ccgttgggg^ gaatatgtgg 402 0 

gtcgacatta agacgccaga aggggtgacg gagaaggatt atcagttccfc gccgtcgaaa 408 0 

aatgaccatt tcgtgagcgg aaaaatcacg cgtacattta gtaccagcaa. gcctggtgtc 4140 

tatacgttca catttaacgc cctgacgtat ggcgggtacg aaatgaagcc agtgacggtg 42 0 0 

accattaccg cggtggatgc cgatacggca aagggcgagg aggcgatgaa. ctaa 42 54 

<212> Type : DNA 
<211> Length : 4254 

SequenceName : SEQ ID 392 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

60 atggcgcgtg gttgggcgtc ttcagaagcc tcaggcgcga tgactgattg gttaaataac 60 

tttggtactg cgagaatctc tctgggtgtg gatgaagatt ttagcctgaa aaattcgcaa 120 

ttcgacttcc tgcatccgtg gtatgacaca cctgattatc tgctcttcag ccagcatacc 180 

cttcaccgaa cagacgatcg tacccagatc aacaccggtt tgggctggcg tcatttcacc 240 

tccagctgga tgtcaggcat caaccttttt tttgaccacg acctgagccg ctatcactcc 3 00 

65 cgcgcagggc ttggcgcaga atactggcgt gattatctga agttgagcag caacgcttat 3 60 

atcggcctga ccggctggcg tagcgcacca gaattggata acgacttcga. agcccgcccg 420 

gccaacggct gggatttacg cgcggaaggc tggttacctg cctggccaca actgggggga 480 
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aaactggtct atgaacaata ctatggcgat 
caaagtaacc cccatgctat taccgcaggc 
ctcagtgcgg aacagcgtca ggggaagcaa 
ctgacctggc aacccagcag ttcaatgcag 
5 cggcgcagtc tggccggtag tcgttatgac 
gaataccgca agaaagagct gattcgcctg 
ggagaaataa aaccgctggt ttcctcgcta 
atcgaagccg ctgcgctgga agctgccgga 
acggtcacgc tgccaggtta ccgcttcact 

10 atagacgtta ccgccgagga tgtaaaaggt 
gttattcagg ctccgacatt aagccagaaa 
gtggctgcag ataaaaaatc gacgaccaca 
actccggtgc cggggctggc gctgcaaacc 
tctgactgga >cagataacgg tgatggtagt 

15 tcaggfctcag taacactgac gccgcaaatt 
gtcgttaata tcgtccctgt tgtctcatcc 
gtatcgtatt atgccggaga cgacatcaag 
caaccggttg catatcaaaa agaggaattg 
cctggcgcca cgattgtctg gcacgaagag 

20 gcctataagc aagggactgc actaagggca 
ctgcaatcgc atatttataa cattgaggca 
tcagcgacaa ataatgacgt ttacgccgat 
gtcactgatg agagtgataa tcccctgaca 
ggaagcgcgg agtttgtcga accgccgcag 

25 ataaacatgg taagtcaggt tgcggaagaa 
ttttcacaac ggataattgc gaaattcgtt 
ctggttgccg atccagatac cattattgct 
atcatcacag actttcataa caacccgtta 
ggtggctcgc aactggacaa cacgaccgcc 

30 cacctgacca gttcaaaagc tggtagctat 
aatattcacc agtcggtcac gatcaccgtg 
ttgaatgccg ggtcgggcag tgcgatcgct 
agtgtgaaag atgtttatgg acacccgttg 
gcctccatga ccgggaactt cacgctaagt 

35 gatgccgtgg tcacattgcg aggcacaaaa 
accagaaata ataccgttgc ttatcagcaa 
cagctccagc cgctgactgc ctcattaaat 
accctgacgg caacgatcct ggacgcttac 
ttccagagta acgatgtcac tctaagcgaa 

40 gcgacggtaa caatgaccag caatattgcc 
gcgcaagctt ccgataataa aacgtttagt 
aaggtaataa gtataaccgg agccgaaaaa 
cggatactcg tccaggacgc gtttaacaat 
gcgcagccaa caactaacat tacgataggc 

45 gcgtacgtta accttctcag cacccaacct 
aataacagta gtagtaaggt tgacgtgaat 
tcgaaaccag aaactacggt ccataatagt 
aatgcgcggg gtgaattgat gccagggcaa 
gcaacgctaa gcaatacagg ggaagtcctt 

50 ctgaccagtg acaaagtgaa tgtctatacc 
gttcagagcc aggtaacggt tgcggttaag 
gtcgtggctt ctcctgacac catcaccgcc 
cgagtagaag atgattacgg attcccggtt 
accaaaggca gcccggtagt taatattcca 

55 acggcgacaa taaccagtac attggcagaa 
acagccaacc aatccgcaac cattacattg 
attttgaaat ccgatgttga cactctgaag 
ctaacattgc aagacaagta cggtaacccg 
cagtcaggcc ccttcgtgaa ctttctcaag 

60 tatggcgagt acaccgtgac tgtcactggc 
atgctgaacg gggttcatca ggcaaactta 
aaagaaatgt ccggtcatgt cactgcaaac 
agcgaaggct ttgcaggagc gtattacaca 
accgttgatg attatatgtt ttcaagttca 

65 aaagtttctt tcgcaaatat cggcgatcaa 
caaggaggta caacctacca gaccttaatt 
aatcatacca atatctggct agctgccaat 



gaagtggcgc tgtttgacaa gaatgatcgt 540 

ctcaactata cccccttccc gcttctgact 6O0 

ggtgaaaatg acacacgttt tgccgttgat 660 

aaacagctta atccggacga agtggccgga 720 

ctgattgatc gcaacaacaa catcgttctg 780 

agtctgctgg atccggtgaa agggaagtct 840 

cagaccaaat atgcccttaa aggctataac 900 

ggtaaagtca gcacgtctgg aaaagatatc 9 SO 

aacaccccag aaaccgataa tacatggtcg 1020 

aacctgtcac ggcatgaaca aagcatggta 10 80 

gattctctgt tatccgtcaa tccgctaacc 1140 

ttgaccgtta ctgcgcacga ttccgacgga 12 00 

cgcagtgaag gcgttcagga tatcaccctg 12 60 

tacacacaga tactgacqgc cggaacgaca 13 2 0 

aacggtgaga gtgcggtaaa agaatccatc 13 80 

cgcgaccatt catcaataac aattgataac 1440 

gttagggtgg aactgaaaga cgatagcaat 15 O0 

gtaaaagccg ttactgtcga aaacagcaaa 15 60 

cagccggggg tttatgccgc gaattatccg 1620 

caacttagcc ttcacaactg gaatgctcca 168 0 

aaccagaata aggctcgcgt tgccacatta 1740 

aaaaagacat ttaataccct cacgatcaac 18 00 

aatcatcagg tcacctttaa gaatgaaaaa 18 60 

caaaatacgg atgcatatgg tgttgccaca 1920 

aatacgatta gcgccacgct gccaaatggt 19 80 

agcgattcga gtacgccaaa attcaaacaa 2040 

ggcaacagcc agggcagtac tctgaccgcc 2100 

aaagatatga aagtgaattt tgtggcacct 2160 

acaacagacc agtccggtat tgtgcgggtg 2220 

tccgtcgatg cctcgcttga ggtggataaa 22 80 

gtcccaaaca gggaacaatc ggtaatgacc 2340 

aacaatacaa atatcgttac cctgactgcc 24 00 

ccggatgagg atgtgaaatt taccttgcca 2460 

agtgaaaccg cccgcaccga tgcaaacggt 2520 

gcgggtgagt ttacagttac ggcgacgctg 2580 

gtcactttta ttggggatac aaacagtgcg 2640 

tccattgttg cgggtaacag tacggggagt 27 0O 

caaaatccgc ttaaagacca gttggtcact 27 60 

acagaagtca ccaccaatac gctgggtcag 28 20 

ggacaacata acgtcgtggt gagccggaaa 28 80 

ttatcagtgc taccggatga aagttcggcg 294 O 

acgataacgg tgggcgaaaa catcacgcta 30 0O 

gtaatcgcgg gtcaacgcgt cagattaagt 3 0 60 

gatacggctt acaccgataa taacggttat 3120 

ggggtttatc aggtgacggc aacgctggac 3180 

gtggcaaatg gcaaactcga gttaacatca 3240 

gagggtatta cgctgaccgc aacggcgaga 33 00 

attatcacct ttagcgtaac gcctgaaggt 33 60 

actgaccagt caggtcaggc caaagtgacg 342 O 

gttacggcca taatgggcaa agatgttccc 34 80 

gcagatgcta aaacggcaca tgttgtgagc 3 540 

gacggcatcg atagcagcac catcacttca 3 6 0O 

gaaggtgtcg atattagtca tggcttagac 3 660 

actacgcgta ccgatcagtc cgggcaagtc 3 720 

accttaacag tcaatgtgca agttcctggc 3780 

gttgccggca cggccgatga aagtaagtca 3 840 

gctgactacc agcagagcgc aaaacttacg 39 00 

atagtgacgt ctgatcatct ggaatttgtc 3 960 

ttgagcgata ttgattacag ccaaagaaat 402 O 

ggaaaagagg gaacagcgac actcattccc 40 8 O 

agcatatcgc tgaatctcat ccaatcgata 414 O 

aaccatacct tctccacggc taaattcccg 42 0 0 

ctcaacaatg ataactttga agcgggtaaa 42 GO 

cagggttggg tgtctgtcga tgcttcgggt 43 2 O 

acgtcagtca caataagcgc tgttccccga 43 8 O 

aagctgaaag gctggtgggt gaataatgga 444 O 

gcgctctgtc atgctaaaaa tgatggatat 450 O 
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aatcttcctg gcatcacaca tttgacgtct ggcgaaaaca 
tatggtgaat gggggaacgt tggagcgttt tccagtaatt 
tactggacaa gtgaatctga tgattacagt cggcactact 



aacgcacgca gggatcactg 
cgcaatttac accgggagct 
atgtgcagat gctaaccggfc 



atgaccggaa gcgacgctga ttccagcccc caactgaccg cctgccgtaa atcactttaa 



4560 
4620 
4680 
4740 



<212> Type : DNA 

<211> Length : 4740 

SequenceName : SEQ ID 
SequenceDescription : 



393 



10 

Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<4QQ> PreSequenceString . : 

15 atgattactc atggttgtta tacccggacc cggcacaagc ataagctaaa aaaaacattg 60 

attatgctta gtgctggttt aggattgttt ttttatgtta atcagaattc atttgcaaat 12 0 

ggtgaaaatt attttaaatt gggttcggat tcaaaactgt taactcatga tagctatcag 180 

aatcgccttt tttatacgtt gaaaactggt gaaactgttg ccgatctttc taaatcgcaa 240 

gatattaatt tatcgacgat ttggtcgttg aataagcatt tatacagttc tgaaagcgaa 3 00 

20 atgatgaagg ccgcgcctgg tcagcagatc attttgccac tcaaaaaact tccctttgaa 360 

tacagtgcac taccactttt aggttcggca cctcttgttg ctgcaggtgg tgttgctggt 420 

cacacgaata aactgactaa aatgtccccg gacgtgacca aaagcaacat gaccgatgac 480 

aaggcattaa attatgcggc acaacaggcg gcgagtctcg gtagccagct tcagtcgcga 540 

tctctgaacg gcgattacgc gaaagatacc gctcttggta tcgctggtaa ccaggcttcg 600 

25 tcacagttgc aggcctggtt acaacattat ggaacggcag aggttaatct gcagagtggt 660 

aataactttg acggtagttc actggacttc ttattaccgt tctatgattc cgaaaaaatg 720 

ctggcatttg gtcaggtcgg agcgcgttac attgactccc gctttacggc aaatttaggt 780 

gcgggtcagc gttttttcct tcctgcaaac atgttgggct ataacgtctt cattgatcag 840 

gatttttctg gtgataatac ccgtttaggt attggtggcg aatactggcg agactatttc 900 

30 aaaagtagcg ttaacggcta tttccgcatg agcggctggc atgagtcata caataagaaa 960 

gactatgatg agcgcccagc aaatggcttc gatatccgtt ttaatggcta tctaccgtca 1020 

tatccggcat taggcgccaa gctgatatat gagcagtatt atggtgataa tgttgctttg 1080 

tttaattctg ataagctgca gtcgaatcct ggtgcggcga ccgttggtgt aaactatact 1140 

ccgattcctc tggtgacgat ggggatcgat taccgtcatg gtacgggtaa tgaaaacgat 1200 

35 ctcctttact caatgcagtt ccgttatcag tttgataaat cgtggtctca gcaaattgaa 1260 

ccacagtatg tfcaacgagtt aagaacatta tcaggcagcc gttacgatct ggttcagcgt 13 20 

aataacaata ttattctgga gtacaagaag caggatattc tttctctgaa tattccgcat 13 8 0 

gatattaatg gtactgaaca cagtacgcag aagattcagt tgatcgttaa gagcaaatac 1440 

ggtctggatc gtatcgtctg ggatgatagt gcattacgca gtcagggcgg tcagattcag 1500 

40 catagcggaa gccaaagcgc acaagactac caggctattt tgcctgctta tgtgcaaggt 1560 

ggcagcaata tttataaagt gacggctcgc gcctatgacc gtaatggcaa tagctctaac 162 0 

aatgtacagc ttactattac cgttctgtcg aatggtcaag ttgtcgacca ggttggggta 1680 

acggacttta cggcggataa gacttcggct aaagcggata acgccgatac cattacttat 1740 

accgcgacgg tgaaaaagaa tggggtagct caggctaatg tccctgtttc atttaatatt 18 0 0 

45 gtttcaggaa ctgcaactct tggggcaaat agtgccaaaa cggatgctaa cggtaaggca 18 60 

accgtaacgt tgaagtcgag tacgccagga caggtcgtcg tgtctgctaa aaccgcggag 1920 

atgacttcag cacttaatgc cagtgcggtt atattttttg atcaaaccaa ggccagcatt 1980 

actgagatta aggctgataa gacaactgca gtagcaaatg gtaaggatgc tattaaatat 2040 

actgtaaaag ttatgaaaaa cggtcagcca gttaataatc aatccgttac attctcaaca 210 0 

50 aactttggga tgttcaacgg taagtctcaa acgcaagcaa ccacgggaaa tgatggtcgt 2160 

gcgacgataa cactaacttc cagttccgcc ggtaaagcga ctgttagtgc gacagtcagt 22 20 

gatggggctg aggttaaagc gactgaggtc actttttttg atgaactgaa aattgacaac 22 80 

aaggttgata ttattggtaa caatgtcaga ggcgagttgc ctaatatttg gctgcaatat 23 40 

ggtcagttta aactgaaagc aagcggtggt gatggtacat attcatggta ttcagaaaat 2400 

55 accagtatcg cgactgtcga tgcatcaggg aaagtcactt tgaatggtaa aggcagtgtc 24 60 

gtaattaaag ccacatctgg tgataagcaa acagtaagtt acactataaa agcaccgtcg 2520 

tatatgataa aagtggataa gcaagcctat tatgctgatg ctatgtccat ttgcaaaaat 2580 

ttattaccat ccacacagac ggtattgtca gatatttatg actcatgggg ggctgcaaat 2 640 

aaatatagcc attatagttc tatgaactca ataactgctt ggattaaaca gacatctagt 2700 

60 gagcagcgtt ctggagtatc aagcacttat aacctaataa cacaaaaccc tcttcctggg 2760 

gttaatgtta atactccaaa tgtctatgcg gtttgtgtag aataa 2805 
<212> Type : DNA 
<211> Length : 2805 

SequenceName : SEQ ID 394 

65 SequenceDescription : 



Sequence 
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<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString ; 

atgttagtac ttagcgaaag cttcaagaat aaattgcttc ccatgaatgg gtatatgaaa 60 

5 ggcggcagcg actccggatc taaagcccag gcacgcgcaa ctgaaaaggg catcgaactg 12 0 

cagcgtgaaa tgtggcagac gaacatgcaa aaccttgcac cgttcacgcc actcgctcag 180 

cagtacgtat cacagttgca gaatctttcc tctcttcagg ggcaaggtca ggcgcttaac 240 

cagtattaca actctcagca gtataaagac cttgcagggc aggcgcgcta tcagagtctg 30 0 

gcagcagcag aggcaacggg tggattaggc tctacagcaa caggaaacca gttagcagca 3 60 

10 atcgcaccta cactcggtca aaactggctg tcaggtcaga tgaacaacta caacaatctg 420 

gcaaatatcg gccttggtgc tcttacaggt caggcaaacg ccggacagaa ctacgctaac 480 

aatgtcagcc aattgtatca acagcaggcg gcagcatcgg cagcaaatgc gaataagcct 54 0 

tcaggcctac agagttttgc tacaggtgcc attggtgggg ccgcatcagg tgcaatgatt 60 0 

ggtagtgcag ttcctgttat tgggactggt attggtgctc ttgctggcgg tgttatcggt 660. 

15 ggtcttggat cattgtttta a 681 
<212> Type : DNA 
<211> Length : 681 

SequenceName : SEQ ID 3 95 
SequenceDescription : 



20 



Sequence 



<213> OrganisniNarae : Escherichia 
<40 0> PreSequenceString : 

25 atgaaaaaaa tattatcagg gttgattctg 
aatggtgatg gcgcaacgca catgtcaaat 
gcgaataacc actccggata caatattttc 
ccggtgcgct gtcactgtga tgacacgcat 
cctatcttct acacgggaga tgccgcaccg 

30 ttaaattact atgctctgaa tgattattta 
aaccaatatg cggccattcc ttttgaacac 
acctgtggag caggtaataa tgggagcact 
ttatctttct atgttcggca ttctattact 
gcctggttgt acgcgggcat gtccgatcat 

35 acaattcgcg gacaactaac ggccccgcag 
gatgtcgatt ttcaaaaaat taatagcgct 
gcagaaagaa agattaaaac cgaagtcaca 
tccacggagg tggtgagtgc gtcgatgatt 
atcgtgacga gtaatccgga tgtgggaatt 

40 aatgtggatg ggggcaactt acccgctgat 
gatggtagcg taacgtttta ttcagcgccc 
gataatggat ttaccgctac agccacgctg 

<212> Type : DNA 
45 <211> Length : 1071 

SequenceName : SEQ ID 396 
SequenceDescription : 



coli 0157 :H7 

ctgctttgct gtccttatgg tttcgccgct 60 

ttatcatttg gtccgctgac ggtggcagcg 12 0 

gaggcactga gcaacacgac tggaacatac 180 

ggcggaccgg gccaacaaac agcatttttt 240 

gggcttgtgc ttgagcgcac tcttaatggg 3 00 

tcggtcggcg tgacgatttt tattattaat 3 60 

ttatccaacc aatccacctc accgcaacat 420 

gtaaatctgg attcagggcg ctcggcaaaa 480 

ggcacggtga caatacccac aacggaagtc 540 

tttcccaaaa cgacccccgt ttctaaagtg 600 

aactgtgagt taacgccaaa tcagagcatc 660 

gagttctcct caacggcggg ttcaattatt 72 0 

gtatcctgta ccgggatgga agacgtaagg 780 

gcggcaaaca gaagtgccga tgccaccatg 840 

aagatttttg ataagaacga ccgtccagtg 900 

atgggtgcta ttagtcgatt aggaaaaacc 960 

gccagtctga cgggcgcaaa accagcgcct 1020 

gttattgaat ttactaacta a 1071 



Sequence 

50 

<213> OrganismName : Escherichia 
<400> PreSequenceString : 
atgaataaaa tatatcggct aaagtggaac 
gagctgggga gcagagtaaa aggaaaaaag 

55 ttatattcat ctctggtatt cgccgatgat 
tttggcaaag agaaccagag catcgattac 
gtaatcaatg cgacagatac ttcccgtccg 
gatattaccg gaggaaaggt aacaatcaat 
gggttcctga atgtctccaa tgctggcagc 

60 aactcaggca tgagacacga tcgcggctat 
gttaagggca ccagccgtct gacctatttg 
gtaaattccg aaaccttctt tatgggcgtt 
tcagttaata acggcggtga agttaatgcc 
caagtctccg atacaacact tgctgtttcg 

65 agtttaagca ccaactctga gttagcgtta 
gcagggatta ttgatgccga aaaaattgag 
atcaccttaa accacacgga taaagacgcg 



coli 0157:H7 

aggtcccgta actgttggag cgtctgctcg 60 

tcccgggctg ttttaattag cgcgataagt 120 

gtcatcgtaa accaggataa aactattgat 180 

cgtattacgg tgacagacaa tgccaatctg 240 

cgtctgactc tcgcttctgg tggtgggttg 3 00 

ggcccactta actttttgct gaaaggtacg 3 60 

gagttatatg ctgatgattt gtatgaatca 420 

tttaatgtct ccaacggcgg caaaatccat 480 

cagggaaatg tcagtggtga aggtagccag 540 

tacggcagtt acggtggtaa tcagtacctg 600 

aggaagcaaa ttagcctggg ctattatgat 660 

gaaggtggta aaatttctgc gcctactatt 720 

ggggcacagg aaggaagcgc agcgaaggca 780 

tttgtgtggg caaagacatc cgagaagaaa 840 

actatttccg cggatattgt cagtggcagc 90 0 
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gagggcctgg gctatatcaa tgcgctcaat ggcacgactt acttaaccgg tgataactct 960 

gcctttagtg gtaaagtcaa aattgagcaa aatggcgctt tagggatcac ccaaaatata 1020 

ggtacagcag agatcaacaa ccgcgggaaa ttacaccfcga aggctgacga tagcatgacc 1080 

tttgccaata agatctctgg caacggtaca ataagtatcg acagtgggac ggtggagttg 1140 

5 accggcaata actatgcatt cagcggatat attgatgttg cttctggtgc tgtcgctgtt 12 0 0 

atttctgaag acaagaatat cggtcgtgca gagctggatg tcgatggcaa attgcaaatt 1260 

aatgccaaca aagattgggt atttgataac gatcttgaag gtagaggcat tgttgaaata 1320 

aacatgggga atcacgaatt ctccttcgat gagtttgctt atacagactg gttccagggt 13 80 

tcactggcgt tccagaacac gacatttaat ctggaaaaga atgctgagtt tctgcagaaa 1440 

10 ggcgggatca ctgcgggtca gggaagcctg gtaacagtgg gtaagggcgc tcactccatt 1500 

agcactttgg gattctccgg cggaaccgtt gattttggtg ccctgacagc aggtgcacag 1560 

atgacagaag ggacggtcaa cgttagtaaa acgctggatt tgcgcggcga gggtgtgatt 1620 

caggtttctg acagtgacgt tgtccgctca gtatctcgtg atattgactc tgcgttatcg 168 0 

ctcactgaag tcgatgatgg taacagcacc attaagttgg ttgatgcgca aggtgcggaa 1740 

15 gttctgggcg atgcgggcaa tctgcaattg caggataaaa atgggcaaat cctctccagc 18 0 0 

agcgcccaac gtgatattca gcagaatggg caaaaagcgg ccgtcggcac ttacgactat 1860 

cgtctgacga gtggggtaaa caatgacggt ctgtatattg gttacggcct gacccagctt 192 0 

gatttacacg ctaccgacag cgatgctctg gtgctgagct ctaacggtaa aagcgagaat 1980 

gccgccgatc tcagcgcaaa gattaccggc agtggtgacc tggcattcag cagccagaag 2040 

20 ggtcagaccg tatcgctttc taacaaagac aacgactata ccggtgttac cgatctgcgc 2100 

agtgggacgc ttttgttgaa fcaacgataac gtgttgggta atacccatga actgcgtctg 2160 

gcggcagaga ctgaactgga catgaatggt cacagccaga ccgtgggcac gctcaatggc 2220 

agcgccgatt cactgctgag cttaaatggc ggcagtctga cggttaccaa cgggggcact 2280 

tcaaccggtt cgttaacggg gagcggagag ctgaatattc agggcggcac gctggacatc 2340 

25 gcgggcgata acagcaacct gacggcgaat gtgaacattg ctaattcggc taatgtcctg 2400 

gtaagtcatg cgcagggatt gggtagcgca aacgttgaga acaacggtac cctggcgttg 2460 

aataatagcg ctgaaaaaag agcggctgcg tctgtgaatt acgccctggg cggcaatctg 2520 

accaacaacg gtacgctgat gaccggaatg tcaggacagc aagctggcaa tgtgttagtg 2580 

gtgaagggga actaccacgg taataacggt caactagtaa tgaatacggt actgaatggc 2640 

30 gatgactcag taaccgataa attggttgtc gagggcgata ctagcggcac gactgccgtt 2700 

acggtgaata acgctggcgg tacaggtgcg aaaaccctta acggtatcga acttatccat 2760 

gtagacggta agtctgaggg cgaatttgtt caggctgggc gtatcgttgc gggggcgtat 2820 

gactacactc tcgcgcgtgg acaaggggca aatagtggta actggtatct gaccragcggc 2880 

agtgattctc ctgaactgca gccggagcca gacccgatgc cgaatccaga gccaaacccg 2940 

35 aafcccagagc cgaaccctaa cccgacacct acgccgggtc cggatctgaa tgtggataat 30 00 

gacctgcgac cggaggcggg tagctacatt gcgaaccttg cagcagcgaa taccatgttc 3060 

accacgcgtc tgcatgagcg tctgggtaat acgt aetata ccgacatggt gaegggtgag 3120 

cagaaacaaa ccactatgtg gatgegecat gaaggtggtc ataataaatg gcgtgatggc 3180 

agcggccagc tgaaaaccca aagcaatege tatgttctgc aactgggagg cgatgtcgcg 3240 

40 cagtggagcc aaaacggcag cgaccgctgg catgttgggg tcatggcggg atatggcaac 33 00 

agegacagea aaaccatttc ctcgcgaacc ggttatcgtg caaaagegag tgtgaacgga 33 60 

tatagcacag gcctctatgc cacctggtat gecgatgacg agtcgcgtaa tggegegtat 3420 

ctcgacagtt gggcgcagta cagctggttt gataacacag tgaaagggga tgacttacaa 3480 

agtgaatcct ataaatcaaa aggatttacc gcttcactgg aagctggata caaacacaaa 3540 

45 ttagctgaat ttaatggcag ccagggaacg cgtaatgaat ggtatgttca gccgcaagca 3600 

caggttacct ggatgggagt caaagecgat aagcaccgcg aaagcaaegg aaccctcgtt 3660 

catagcaacg gtgatggcaa tgttcaaacc cgacttggcg taaaaacctg gctgaagagc 3720 

caccataaaa tggatgaegg taaatcccgc gagttccagc cgtttgtaga agtgaactgg 3780 

ctacataaca gtaaggattt cagcaccagt atggatggcg tgtctgtcac tcaggatgga 3 840 

50 geccgaaata^ ttgetgagat aaaaaccggg gtggaaggac agetaaatge caacctgaat 3900 

gtctggggga atgtgggcgt teaggttgee gataggggat ataatgacac etctgeaatg 3 960 

gttggcatta agtggcaatt ctga 3984 
<212> Type : DNA 
<211> Length : 3984 

55 SequenceName : SEQ ID 397 

SequenceDe script ion : 

Sequence 



60 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgataacga tgaaaaaaag tgtattgacg gcgtttataa ctgtggtatg tgcaacgtcc 60 

agcgttatgg ctgctgatga taatgetate aeggatgget cagtaacatt taatggtaaa 120 

gttattgetc cagcttgtac cctggtagct gcgacgaaag attccgtggt gaetttgeca 180 

65 gatgttagtg ccacgaagtt gcaaaccaat ggtcaggttt ctggcgtgca aactgatgtg 240 

ccaattgaat taaaagattg tgatactacc gtaacaaaaa atgcaacgtt cacctttaat 300 

ggcactgegg atactactca gattacagcg tttgetaace aggcctcatc tgatgetget 360 
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acaaacgtgg ccctgcaaat gtatatgaat gatggtacaa cggccatcaa gccagacaca 420 
gaaaccggga acattttgtt gcaagatgga gatcagacgt tgacttttaa agttgattat 48 0 

atcgctacgg ggaaagcgac ttctggtaat gtgaatgcgg taacaaattt ccatattaac 540 
tattattaa 549 
5 <212> Type : DNA 

<211> Length : 549 

SequenceName : SEQ ID 398 

SequenceDescription : 

10 Sequence 



30 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgagtaagt ttgtaaaaac, agctattgct gcaacaatgg taatgggtgc gtfccgcttct 60 

15 acttcaacaa tcgccgctgg caacaatggt acagcacgtt tctacggcac cattgaagat 120 

tccccgtgct ctatcgttcc tgatgatcac aaactggaag ttgatatggg tgacattggt 18 0 

tcagggatcc tgaaaaataa cgggacttct acaccgaaag ctttccagat ccatctgcaa 240 

gactgtgtgt ttgacaccca gacaacgatg accactacct tcaccggtaa cgcgtcttct 300 

accaacagcg gcaattacta caccatttac aataccgata ctggtgcggc atttaacaat 3 60 

20 gtcagcctgg ccattggtga cgctcaggga acctcttata aaagcggcgc gggtatcgaa 420 

cagaaaatcg taaacgatac ggcgaccaac aaaggcaaag cgaagcagac gctggacttt 48 0 

aaagcctggc tggtgggcgc tgctgatgcg ccagatttag gtaattttga agccaacacg 540 

accttccaga ttacttatct ctaa 564 
<212> Type : DNA 
25 <211> Length : 564 

SequenceName : SEQ ID 3 99 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgcgggtta tctttctacg caaggagtat ttatctctac tcccgtcaat gattgcatct 60 

cttttctctg ctaacggtgt cgcggcggcc attgatttat gccagggata tgatatcaaa 120 

35 gcgagttgtc acgccagcag gcaaagcctt tcaggcatta cgcaggtctg gagtattgcc 180 

gatgggcaat ggctggtttt ttcggatatg accaataatg ccagcggtgg ggccgtattt 240 

ttgcaacaag gagcggaatt tacattatca ccagaaaatg aaactggaat gactctgttt 3 00 

gccaataaca ccgtttcagg agaatataat aacggcgggg caatatttgc taaagaaaac 360 

tcaacgctga atcttacgga tgttattttt tctggtaacg tcgcaggcgg ctatggtggc 420 

40 gcaatctatt cttctggtac taacgatacc ggtgccatcg atttacgtgt cactaacgcc 480 

gtgtttcgca ataacatcgc taatgacggc aaaggtggtg caatttatac catcaataat 540 

gatatctatt taagtgatga tgtttttaac aataaccagg catatacatc aacaagttac 600 

agtgatggcg atggcggcgc aatcgatgtc acagataata atagcgacag caagcatcct 660 

tcaggttata cgataataaa taacactgcc tttacaaata acactgccga aggttatggc 720 

45 ggggcgatat ataccaatag cgcgacggct ccctatctta ttgatatttc tgttgatgac 780 

agctacagcc agaacggagg cgtgttagtc gatgagaaca atagcgcagc aggctatgga 840 

gatggtcctt cctctgcggc gggtggcttt atgtatctcg gcttaagtga agttaccttt 900 

gatattgccg acggaaaaac gctggttatt ggcaatacag agaatgacgg agctgttgac 960 

tctattgctg gtaccgggtt aatcaccaaa acaggttccg gcgatctggt acttaatgca 1020 

50 gataacaatg actttactgg tgagatgcag attgaaaacg gtgaagttac cctgggccgc 1080 

agcaactccc tgatgaatgt cggcgatacg cattgccagg acgatccgca agactgctac 1140 

ggtctgacga tagggagtat tgataagtac cagaatcagg cagagctaaa tgttggctcc 1200 

acccaacaaa cctttgcgca ctcattgacg ggctttcaga atggcacttt aaatatcgat 1260 

gctggtggca atgttactgt taatcaaggc agttttgctg gcaccatcga aggtgctggt 132 0 

55 cagctcacca ttgcgcaaaa cggcagctat gtgctggcgg gggcgcagtc gatggcgcta 13 8 0 

accggcgata tagtggtgga tgctggtgcg gtgctttcgc tggaaggcga cgcggcagat 144 0 

cttgccgctc tccaggacga tccgcagtcg atcgtgttaa acggcggtat gctcgatctc 1500 

tctgatttct ccacctggca gagcggtaca tcatacaaag atggccttga agtcagtggc 1560 

agcagcggaa cggttatcgg cagtcaggat gtggt agate ttgeaggegg aaacgatatg 1620 

60 catateggeg gcgacgggaa agatggcgtc tacgtggtga tegatgeggg tgacgggcag 168 0 

gtcagcctgg caaatgacaa tcaatacctc ggcacaacgc aaategctte eggtaegctg 1740 

atggtgagcg acaactcgca gcttggatat acccattata accgccaggt tatctttacc 1800 

gataagecac aagaaagcgt gatggagatt actgecaatg tcgatactcg ctctacaacg 1860 

actgagcatg ggcgtgatat tgaaatgege gecgaeggtg aagtggcagt tgatgcgggg 192 0 

65 gtagacaege agtggggcgc actgatggct gaeagcageg ggcagcatca ggatgagggt 198 0 

agcacattga etaaaaeggg ggegggtaca ctggagctga ccgccagcgg tacaaegcag 2040 

tcagcggtga gagtagaaga gggcacgctg caaggtgatg ttgeggatat cttcccttat 2100 
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gcttcgtcgc tatgggtcgg tgacggggca 
cagtcaattg atgctacttc cagcggcact 
ctgaccgggc aggatacttc cgtcgccctt 
ctggtgaatg ccaccgatgg tgtgacgttg 
5 gacagcctga cttatctttc caacgtgacg 
gcggttagcc tgcaaaatgg cgtcgctggc 
ggcggcggta cgctactgct cgatagcgaa 
ttggtgatga acggtaatac tgctggcaac 
attggtgagc cgacatcgac aggcattaaa 

10 tttcaaaaca atgcgcagtt cagtctggca 
gactacacgc tggtggaaga taacaacgac 
ccatcgccac ctgatccaga cccgactccc 
acacccgacc cggaacctac gcctgcttac 
tatctcaata acctgcgggc ggcaaatcag 

15 ggtggcgatg gtcagacgct gaatttacgt 
gcggggcaac tggctcaaca tgaagacact 
agcgggcgct ggggcacgga tggcgagtgg 
aaccagggcg acagccgctc gagtatgacc 
tatgcggttg ggctgacctc aagctggttt 

20 ctggataact ggttgcagta cgcgtggttt 
gtggatcatt accattcgtc ggggattatc 
ccggggcgtg gtgtggtgat tgaaccgcag 
gatgatttta ccgccgctaa ccgtgcgcgc 
acgcggctgg gtttacacag cgaatggcgt 

25 ctgaattatt atcacgatcc ccattcgacg 
gacgatgcgg tgaagcaacg gggtgaaata 
cgagtttcgc tgcgtggtag cgtggcgtgg 
gcagggtttt tgtcgatgac ggtgaaatgg 
<212> Type : DNA 

30 <211> Length : 3753 

SequenceName : SEQ ID 400 
SequenceDescription : 



acgttcgtta ctggcgcgga tcaggatatt 2160 

atcgacatca gcgatggtac ggttttgcgc 222 0 

aatgcctcac tgtttaactg cgatgggacg 228 0 

acaggtgagc ttaataccaa ccttgaaact 2340 

gttaatggca atctgaccaa tacgtccggt 2400 

gatacgctga cggtaaacgg tgattatacc 2460 

ttaaacggcg atgactcggt aagcgatcaa 2520 

acaactgtgg tggttaactc cattacaggg 2580 

gtggttgatt tcgcagctga tcccacgcag 2640 

ggcagcggct acgtcaatat gggagcgtat 27 0 0 

tggtatctgc gatcgcaaga agtaacgccg 2760 

gatcctgatc ccacgcagga tcctgafccca 282 0 

cagccggtgt tgaatgccaa agttggcggt 28 80 

gcgtttatga tggagcgacg cgatcacgca 2940 

gttatcggcg gagattatca ttacacagca 30 00 

tctacggtgc aacttagcgg cgatctgttt 3 060 

atgcttggga ttgttggtgg ctacagcgat 3120 

ggaactcgcg ccgataacca gaaccacggt 3180 

cagcacggta agcagaagca gggggcctgg 3240 

agcaatgatg tttctgaaca tgaagatggc 33 00 

gcctcgctgg aagcggggta tcagtggtta 33 60 

gcgcaggtga tttatcaggg cgtgcagcag 3420 

gtgtcacaat cgcagggtga tgatattcag 3480 

accgctgttc atgtcatacc aacattagat 3 540 

gaaattgaag aagatgccag cactatcagt 3 600 

aaagtgggag tcacgggcaa tatcagtcag 3 660 

cagaaaggga gtgatgattt tgcccagacg 3720 

taa 3753 



35 



Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<40Q> PreSequenceString : 

atgcactcct ggaaaaagaa acttatagta tcacaattag cattggcttg cactctggct 
atcacctctc aggctaatgc agcgaccaac gatatttctg gtcaaactta caatactttc 



gtgtttaata acactattac tgttaaagat tctactgtga cctctggttc atggactgat 
50 gaaggtacta ctggttggtt tggccatact ggtaatgcca gcaactatag caacacgctg 



60 
120 



40 catcactaca acgacgccac ctatgctgat gacgtttact atgatggtta tgtaggctgg 180 
aacaactatg ccgctgatag ctattacaac ggcgatatct acccggtcat taataacgct 
accgttaacg gcgtgatttc tacctactat ctggacgacg gtatttctac caataccaac 
gccaatagtc tgacaatcaa aaacagcact attcacggta tgattacctc tgagtgcatg 

actactgatt gtgctgatga ccgtgctact ggttatgttt atgatcgtct gacactgagc 420 

45 gttgataatt caacgatcga tgacaactac gagcattata cttacaacgg tacctataat 480 

aatgccgctg acactcatgt tgtagatgtt tacgatatgg gtactgctat tacactggat 540 

caggaagttg atctgtccat cactaataac tctcatgtag caggtattac gctgactcag 600 

ggttatgagt gggaagatat tgacgacaac acagtcagca ctggcgtaaa cagcagcgaa 660 

780 



240 
300 
360 



actgcagacg atgttgcaat tgccgcaatc gcaaatccgt atgctgataa tgcgatgcag 840 

actacagtaa ctttagacaa ctcaacactg atgggtgatg ttgttttctc cagtaatttc 900 

gatgaaaact tcttcccgca aggtgctaac agctatcgcg atgctgatgg tgatgtagat 960 

accaacggtt gggatggcac agaccgtatg gatgtgactc tgaacaacgg cagcaagtgg 102 0 

55 gttggcgctg caatgtctgt tcatatggtt gatgaagatg gtgatggttc ttacgacgga 108 0 

tatgctgttg gtactgaagc aactgcaact ctgctcgata ttgcagctaa cagcctgtgg 1140 

ccttcatcaa ctgtcggtgt tgataacatc aatactcaat atgacgaaaa tggccatatc 1200 

gtaggaaacg aagtttacca gagcggtttg tttaatgtga ctttgaacgg tggttcagag 1260 

tgggatacaa caaaatcttc tctgattgat actttaagta ttaacagcgg ttcccaagtt 132 0 

60 aatgttgcag actctcgtct gatctctgac actgtctctc tgactggcgg ttctaacctg 13 80 

aacatcggtg aagacggtca tgtagcgact aataccctga ccatcgacaa tagtaccgtt 1440 

aaaatgtctg atgatgtttc tgcgggctgg ggtttagaag atgctgcact gtacgcaaat 1500 

accatcaccg taactaacga cggtctgttg gatattaacg ttgatcagtt cgatgctaac 1560 

ccgttccagg ccgataccct gaatctgacc agtaccactg atactaacgg caacattcac 162 0 

65 gctggtgtat tcgatatcca tagcagtgat tacgtaatgg ataccgatct ggtcaacgat 168 0 

cgtaccaacg atactaccaa gtcaaactac ggttatggct taatcgcaat gaactctgat 1740 

ggtcacctga ctattaacgg taacggcgat aacgacaaca ctgcttctat cgaagctggt 1800 
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cagaacgaag ttgataacaa cggtgaccat gttgcagccg cgaccggtaa ctacaaagtt 
cgtatcgaca acgctactgg tgctggttct atcgctgact acaacggcaa cgagctgatc 
tacgtcaacg acaaaaacag caacgcgacc ttctctgctg ctaacaaagc tgacctgggt 
gcatacacct atcaggctga acagcgcggt aacaccgttg ttctgcaaca gatggagttg 
accgactacg ctaacatggc gctgagcatc ccatctgcga acaccaatat ctggaacctg 
gaacaagaca ccgttggtac tcgtttgacc aactctcgtc atggcctggc tgataacggc 
ggcgcatggg taagctactt cggtggtaac ttcaacggcg acaacggcac catcaactat 
gatcaggatg ttaacggcat catggtcggt gttgatacca aaattgacgg taacaacgct 
aagtggatcg tcggtgcggc tgcaggcttc gctaaaggtg acatgaatga ccgttctggt 
caggtggatc aagacagcca gactgcctac atctactctt ctgctcactt cgcgaacaac 
gtctttgttg atggtagctt gagctactct cacttcaaca acgacctgtc tgcaaccatg 
agcaacggta cttacgttga cggtagcacc aactccgacg cttggggctt cggtttgaaa 
gccggttacg acttcaaact gggtgatgct ggttacgtga ctccttacgg cagcatttct 
ggtctgttcc agtctggtga tgactaccag ctgagcaacg acatgaaagt tgacggtcag 
tcttacgaca gcatgcgtta tgaactgggt gtagatgcag gttatacctt cacctacagc 
gaagatcagg ctctgactcc gtacttcaaa ctggcttacg tctacgacga ctctaacaac 
gataacgatg tgaacggtga ttccatcgat aacggtactg aagggtctgc ggtacgtgtt 
ggtctgggta ctcagttcag cttcaccaag aacttcagcg cctataccga tgctaactac 
ctcggtggtg gtgacgtaga tcaagactgg tccgcgaacg tgggtgttaa atatacctgg 
taa 



1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
2943 



<212> Type : DNA 

<211> Length : 2943 

SequenceName : SEQ ID 401 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

atgaaactca aacatgttgg tatgattgtc gtttctgtgt tggcgatgtc gtctgctgcg 60 

gtaagcgcag ccgagggtga tgaatcagta acgaccactg ttaatggcgg tgttattcat 120 

tttaaaggtg aagtggtaaa tgccgcttgt gcgattgatt ccgaatcaat gaaccaaacg 18 0 

gttgagctgg gtcaggttcg ttcttctcgc ctggctaaag cgggtgacct cagctccgcc 240 

gttggcttca atatcaagct gaatgattgt gataccaatg tttccagtaa tgcagctgtt 3 00 

gcattcctgg gtactactgt caccagtaat gacgatacgt tagcgctgca gagttcagcg 360 

gcaggctctg cccaaaatgt cggtattcaa attttggacc gtacgggtga ggtattaata 420 

cttgatgggg ccacttttag tgctaaaacc gacttgattg atggcacgaa tatactacca 480 
ttccaggctc gttatattgc tctcgggcag tccgtagctg gtactgcaaa cgcagatgcg - 540 

accttcaaag ttcaatatct ataa 564 
<212> Type : DMA 
<211> Length : 564 

SequenceName : SEQ ID 402 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgaaacttt taaaagtagc agcaattgca gcaatcgtat tctccggtag cgctctggca 60 
ggtgttgttc ctcagtacgg cggcggtggc ggtaaccacg gtggtggcgg taataacagc 120 
ggcccgaatt cagagctgaa tatttatcag tacggtggtg gtaactctgc acttgctctg 180 
caagctgatg ctcgtaactc tgatcttact attacccagc atggtggtgg taacggtgca 240 
gatgttggtc agggctcaga tgacagctca atcgatctga cccaacgtgg ctttggtaac 3 00 

agcgccactc ttgatcagtg gaacggtaaa gactctcata tgacagttaa acaattcggt 360 
ggcggcaacg gtgcagcggt tgaccagact gcatctaatt ccaccgtcaa cgtaactcag 420 
gttggctttg gtaacaacgc gaccgctcat cagtactaa 459 
<212> Type : DNA 
<211> Length : 459 

SequenceName : SEQ ID 403 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgcctattg gtaatcttgg tcataatccc aatgtgaata attcaattcc tcctgcacct 60 
ccattacctt cacaaaccga cggtgcaggg gggcgtggtc agctcattaa ctctacgggg 120 
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ccgttgggat ctcgtgcgct atttacgcct gtaaggaatt ctatggctga ttctggcgac 18 0 

aatcgtgcca gtgatgttcc tggacttcct gtaaatccga tgcgcctggc ggcgtctgag 240 

ataacactga atgatggatt tgaagttctt catgatcatg gtccgctcga tactcttaac 3 00 

aggcagattg gctcttcggt atttcgagtt gaaactcagg aagatggtaa acatattgct 360 

5 gtcggtcaga ggaatggtgt tgagacctct gttgttttaa gtgatcaaga gtacgctcgc 420 

ttgcagtcca ttgatcctga aggtaaagac aaatttgtat ttactggagg ccgtggtggt 480 

gctgggcatg ctatggtcac cgttgcttca gatatcacgg aagcccgcca aaggatactg 540 

gagctgttag agcccaaagg gaccggggag tccaaaggtg ctggggagtc aaaaggcgtt 60 0 

ggggagttga gggagtcaaa tagcggtgcg gaaaacacca cagaaactca gacctcaacc 660 

10 tcaacttcca gccttcgttc agatcctaaa ctttggttgg cgttggggac tgttgctaca 72 0 

ggtctgatag ggttggcggc gacgggtatt gtacaggcgc ttgcattgac gccggagccg 780 

gatagcccaa ccacgaccga ccctgatgca gctgcaagtg caactgaaac tgcgacaaga 840 

gatcagttaa cgaaagaagc gttccagaac ccagataatc aaaaagttaa tatcgatgag 900 

ctcggaaatg cgattccgtc aggggtattg .aaagatgatg ttgttgcgaa tatagaagag 960< 

15 caggctaaag cagcaggcga agaggccaaa cagcaagcca ttgaaaataa tgctcaggcg 1020 

caaaaaaaat atgatgaaca acaagctaaa cgccaggagg agctgaaagt ttcatcgggg 1080 

gctggctacg gtcttagtgg cgcattgatt cttggtgggg gaattggtgt tgccgtcacc 1140 

gctgcgcttc atcgaaaaaa tcagccggta gaacaaacaa caacaacaac tactacaact 12 00 

acaactacaa gcgcacgtac ggtagagaat aagcctgcaa ataatacacc tgcacagggc 1260 

20 aatgtagata cccctgggtc agaagatacc atggagagca gacgtagctc gatggctagc 1320 

acctcgtcga ctttctttga cacttccagc atagggaccg tgcagaatcc gtatgctgat 13 8 0 

gttaaaacat cgctgcatga ttcgcaggtg ccgacttcta attctaatac gtctgttcag 144 0 

aatatgggga atacagattc tgttgtatat agcaccattc aacatcctcc ccgggatact 1500 

actgataacg gcgcacggtt attaggaaat ccaagtgcgg ggattcaaag cacttatgcg 1560 

25 cgtctggcgc taagtggtgg attacgccat gacatgggag gattaacggg ggggagtaat 162 0 

agcgctgtga atacttcgaa taacccacca gcgccgggat cccatcgttt cgtctaa 1677 

<212> Type : DNA 
<211> Length. : 1677 
30 SequenceName : SEQ ID 404 

SequenceDescription : 

Sequence 

35 <213> OrganismName r Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgttttcta ctttcaaaaa agcagctctg ctggcagcta ttgcattacc tttttcaact 60 

atggctgcgc ctacagtcac ttttcagggt gaagtaaccg atcagacctg ttccgtaaat 120 

atcaacggtc aaaccaattc agtagtattg atgccgaccg tagccatggc tgacttcggt 180 

40 gcaactttag ctgatggtca gagcgcaggc cagacgccgt ttacggtttc tgtgtctaac 240 

tgccaggctc caactggtgc agatcaggca atcaacacca ccttcctggg ctacgacgtt 300 

gacgctagca cgggtgttat gggaaaccgt gataccagca gcgatgcggc gaaaggcttt 360 

ggcattcagt taatggattc cagcacttct ggtaacccag taactctggc tggcgcgact 420 

aacgtaccgg gtctgaccct gaaagttggc gataccgaag ccagctacga cttcggtgcg 480 

45 cgttacttcg ttatcgatag cgctgctgcc actgccggta aaattaccgc tgtcgcagaa 540 

tacaccctga gctacctcta a 561 
<212> Type : DMA 
<211> Length : 561 

SequenceName : SEQ ID 405 

50 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 

55 <400> PreSequenceString : 

atgaacagtg aaggaggaaa accggggaat gtactgaccg ttaacggcaa ctataccgga 60 

aacaatggcc tgatgacgtt caacgcgacg ctgggcggcg ataattcacc caccgataag 120 

atgaacgtga aaggcgatac ccaagggaac actcgcgttc gggttgataa cattggcggc 180 

gtcggtgcgc aaacggtcaa cggtattgaa ctcattgagg ttggcggtaa ttctgcaggt 240 

60 aatttcgcgc tgaccaccgg aactgtcgaa gctggggctt acgtctacac gctggctaaa 300 

gggaagggga atgacgagaa aaactggtat ctgaccagta aatgggacgg cgtaacgcca 360 

gcggatacac ccgatcccat caataatccc cctgttgtgg atccggaagg cccatcagtt 420 

tatcgcccgg aggccggaag ctatatcagc aacattgccg cagccaactc gctgtttagc 48 0 

catcgcttac acgaccgtct gggtgagccg caatatacag attcactgca ttctcaggat 540 

65 tcagcaagca gtatgtggat gcgtcatgtc ggggggcacg aacgttccag tgccggagac 600 

ggccagctaa atactcaggc taaccgctat gtattgcagc taggcggcga tttggcgcag 660 

tggagtagca acgcgcagga tcgctggcat cttggcgtga tggcaggcta cgccaatcag 72 0 
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cacagtaata ctcagagtaa tcgtgtgggt 
agcgctgggc tgtacgcgac ctggtatcag 
gacagctggg cgctgtataa ctggtttgat 
gacgactatg attctcgcgg tgtgacggcc 
5 ggaacatgta gcggcagcga agggacgctg 
atcacctgga tgggtgtgaa agattctgac 
acggaaggcg acggaaacgt gcaaacgcga 
caccagcgtg acgatggtaa acagcgtgag 
aacaatagca aagtctacgc cgtgaagatg 

10 cgaaatctcg gtgaagtacg taccggggtt 
tgggggaatg tcggtgtgca actaggtgat 
ggagtgaaat atagctggta a 
<212> Type : DNA 
<211> Length ; 1401 ~. 

15 SequenceName : SEQ ID 406 

SequenceDe script ion : 

Sequence 



20 <213> OrganisraName : Escherichia 
<4 00> PreSequenceString : 
atgtcatatc tgaatttaag actttaccag 
catcgtttgg ctggtttttt tgtccggctc 
cctttgtcat ctgccgaact ctattttaat 

25 gtggccgatt tatcgcgttt tgaaaatggg 
gatatctatt tgaataatgg ttatatggca 
agtgaacaag ggattgttcc ctgcctgaca 
acggcttctg tcgccggtat gaatctgctg 
atggtccagg acgctactgc gcatttagat 

30 cctcaggcat ttatgagtaa tcgcgcgcgt 
ggtattaatg ccggattgct caattataat 
gggggtaaca gccattatgc atatttaaac 
cgtttacgcg acaataccac ctggagttat 
aataaatggc agcatatcaa tacctggctt 

35 ctgacgctgg gtgatggtta tactcagggt 
gcacaattgg cctcagatga caatatgtta 
atccacggta ttgctcgtgg tactgcacag 
tataatagta cggtgccgcc ggggcctttt 
agtggtgact tgcaggtaac gattaaagag 

40 ccctattcgt cagtcccgct fcttgcaacgt 
ggagaatacc gtagtggaaa fcgcgcaacag 
ctccacggcc ttccagctgg ctggacaata 
cgtgctttta attttggtat cgggaaaaat 
atgactcagg ctaattccac acttcccgat 

4-5 tttctctata acaaatcgct caatgagtca 
tattcgacca gcggatattt taatttcgct 
aacatcgaaa cacaggacgg agttattcag 
ctcgcttata acaaacgcgg gaaattacaa 
tcaacactgt atttgagtgg tagccatcaa 

50 caattccagg ctggattaaa tactgcgttc 
ctgacgaaaa acgcctggca aaaaggacgt 
cctttcagcc actggctgcg ttctgacagt 
tacagcatgt cacacgatct caacggtcgg 
ttgctggaag acaacaacct cagctatagc 

55 ggtaatagcg gaagcacagg ctacgccacg 
aatatcggtt acagccatag cgatgatatt 
gtactggctc atgccaatgg cgtaacgctg 
gttaaagcgc ctggcgcaaa agatgcaaaa 
tggcgcggtt atgccgtgct gccttatgcc 

60 gataccaata ccctggctga taacgtcgat 
actcgtgggg cgatcgtgcg agcagagttt 
acgctaaccc acaataataa gccgctgccg 
cagagtagcg gcattgttgc ggataatggt 
ggaaaagttc aggtgaaatg gggagaagag 

65 ctgccaccag agagtcagca gcagttatta 



tataaatcgg atgggcgcat cagcggttac 780 

aacgatgcga ataagaccgg cgcttatgtt 840 

aacagcgtca gttccgataa ccgttctgct 900 

tctgttgagg gtgggtatac ctttgaagcg 960 

aatacctggt acgtccagcc acaggcgcaa 1020 

catgcccgga aagacggaac gcgcattgaa 10 8 0 

cttggggtga aaacctacct gaatagccat 1140 

ttccagcctt acattgaagc gaactggatc 12 00 

aatggtcaaa ccgtaagccg tgatggtgcg 1260 

gaggcgaaag taaataacaa ccttagcctg 13 20 

aaaggctata gcgatactca gggcatgctg 13 8 0 

1401 



coli 0157:H7 

cgaaacacac aatgcttgca tattcgtaag 60 

tttgtcgcct gtgcttttgc cgtacaggca 120 

ccgcgctttt tagcggatga tccccaggct 180 

caagaattac cgccagggac gtatcgcgtc 240 

acgcgtgatg tcacatttaa tacgggcgac 3 00 

cgcgcgcaac tcgccagtat ggggctgaat 360 

gcggatgatg cctgtgtgcc attaaccaca 420 

gttggtcagc agcgactgaa cctgacgatc 480 

ggttatattc ctcctgagtt atgggatccc 540 

ttcagcggaa atagtgtaca gaatcggatt 600 

ctacagagtg ggttaaatat tggtgcgtgg 660 

aacagtagcg acagatcatc aggtagcaaa 720 

gagcgagaca taataccgtt acgttcccgg 780 

gatattttcg atggtattaa ctttcgcggc 840 

cccgatagcc aaagaggatt tgccccggtg 900 

gtcactatta aacaaaatgg gtatgacatt 960 

accatcaacg atatctatgc cgcaggtaat 102 0 

gctgacggca gcacgcagat ttttaccgta 108 0 

gaagggcata ctcgttattc cattacggca 1140 

gaaaaacccc gctttttcca aagtacatta 120 0 

tatggtggaa cgcaactggc agatcgttat 1260 

atgggggcac tgggcgctct gtctgtggat 132 0 

gacagtcagc atgacggaca atcggtgcgt 1380 

ggcacgaata ttcagttagt gggttaccgt 1440 

gatacaacat acagtcgaat gaatggctac 1500 

gttaagccga aattcaccga ctattacaac 1560 

ctcaccgtta ctcagcaact cgggcgctca 162 0 

acttattggg gaacgagtaa tgtcgatgag 1680 

gaagatatca actggacgct cagctatagc 1740 

gatcagatgt tagcgcgtaa cgtcaatatt 1800 

aaatctcagt ggcgacatgc cagtgccagc 1860 

atgaccaatc tggctggtgt atacggtacg 1920 

gtgcaaaccg gctatgccgg gggaggcgat 1980 

ctgaattatc gcggtggtta cggcaatgcc 2040 

aagcagctct attacggagt cagcggtggg 2100 

gggcagccgt taaacgatac ggtggtgctt 2160 

gtcgaaaacc agacgggggt gcgtaccgac 2220 

actgaatatc gggaaaatag agtggcgctg 2280 

ttagataacg cggtcgctaa cgttgttccc 2340 

aaagcgcgcg ttgggataaa actgctcatg 24 0 0 

tttggggcga tggtgacatc agagagtagc 2460 

caggtttacc tcagcggaat gcctctagcg 252 0 

gaaaatgctc attgtgtcgc caattatcaa 2 580 

acccagctat cagctgaatg tcgttaa 2637 



<212> Type 



DNA 
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<2ll> Length : 2637 

SequenceName : SEQ ID 407 
SequenceDescription : 



5 Sequence 

<213> OrganisraName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgcagataa tctttggaga aaaatgcgtg tcattactac gactattttt tgccgccgtc 60 

10 ttaatgctat ggtgcgctca aaccgctgct tatagcgggc agtgtcatac cactcagggg 12 0 

aatccgtata ttggcgtcaa ttttggcgtt aaaaccctgg aggaagaaga aaatacgact 180 

ggggtagtaa aagacaaatt ttatcagtgg aacgaatcga atgattatta tgtttcctgt 24 0 

gattgcgata aagacaatgt cagaagtggc cgatgggcat tcgccgcgga ttcaccgtta 3 00 

gtctatttag gcgacaactg gtacaaaatt aatgactatc ttgccgccaa .agttttattg 360 

15 caggttaaag gcagttctcc tacagcggtt cctttcgaaa acgtggggac tggggcagat 420 

acccggtggc atatttgtga ccccggcggt caacgtttag gcggccaggg agctagcggt 480 

aatagcggta gcttttccct gaaaatattg cagccgttcg ttggttcggt cgtcattcct 540 

cctatggcgc tggcgcgatt atttgaatgc tacaacatac ccgcaggtga ttcctgcacg 600 

actacaggca caccggtttt agtgtattac ctgtctggta ctatcaattc acttggctca 660 

20 tgttccgtca atgccggaga aacaatcgag gtcgatctgg gcgacgtatt tgcggctaac 720 

tttcgtgttg tagggcataa gcctcttggg gccagaacgg cagaacttgc aattccagtc 780 

aggtgtaaca cgggaaacgc ggggttagtt aacgtcaacc tgagtctgac ggcaaccaca 840 

gaccccagct atccccaggc gattaagacg tcacgtcctg gcgtgggcgt ggfcggtgacc 90 0 

gatagccaga acaacattat ttcccctgct ggtggaacat taccgctctc tattcctgat 960 

25 gatgcagaca gtatcgcgtg a 981 
<212> Type : DNA 
<211> Length : 981 

SequenceName : SEQ ID 408 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Escherichia 
<400> PreSequenceString : 

35 atgaaaatta aaactctggc aatcgttgtt 
gctctggccg ctgccacgac ggttaatggt 
aacgccgctt gcgcagttga tgcaggctct 
cgtaccgcat cgctggcaca ggacggagca 
ctgaatgatt gcgataccaa tgttgcatct 

40 attgatgcgg gtcataccaa cgttctggct 
aacgttggtg tgcagatcct ggacagaacg 
ttcagtgagc aaacaaccct gaataacggt 
tatgcaatcg gcgaggcaac cccgggtgct 
tatcaataa 

45 <212> Type .- DNA 

<211> Length : 549 

SequenceName : SEQ ID 409 
SequenceDescription : 

50 Sequence 



coli 0157:H7 

ctgtcggctc tgtccctcag ttctacagcg 60 

gggaccgttc actttaaagg ggaagttgtt 120 

gttgatcaaa ccgttcagtt aggacaggtt 180 

accagttctg ctgtcggttt taacattcag 240 

aaagccgctg ttgccttttt aggtacggtg 3 00 

ctgcagagtt cagctgcggg tagcgcaaca 360 

ggtgctgcgc tgacgctgga tggtgcgaca 420 

actaacacca ttccgttcca ggcgcgttat 48 0 

gctaatgcgg atgcgacctt caaggttcag 540 

549 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgaaattaa aagtcatcgc tacactgatt gctactgttg ccgtgggtgt aagctttaac 60 

55 agcaattttg cttctgcgag tacaacgtcc gcttctttaa ccgtaaacag taacctgact 120 

atgggtacct gcagtgctca gataatggat aatagtaata aagtgatcaa tgaagtggtc 180 

tttggcaatg tttatatttc tgaactcggt gcaaaaagca aagtgcaaca gtttaaaatt 240 

cgctttagca attgctctgg ccttccccaa aacagcgccc aaatagtgct ggcacctaat 3 00 

ggtatatcct gtgctggttc tcaatcgtca tcggcgggtt tttctaacaa gtttactgac 3 60 

60 gctagcgcag caaccagaac ggctgtggaa gtatggacta cagatacacc ggaaagcaat 42 0 

ggcagtacgc aattccattg tgctcaaaag ataccagtgc ctgtgacgct tcccgccgac 48 0 

accacaactc agccttacga ttacccgtta agtgcacgga tgaccgttgc ggaaggtaga 540 

ttggtaaccg atgtaagacc gggtaatttc cgctctccca cgactttcac gatcacttat 600 

cagtaa 60 6 
65 <212> Type : DNA 

<211> Length : 606 

SequenceName : SEQ ID 410 
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SequenceDescription 



Sequence 

5 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenc eSt ring : 

ttggcatcaa cagttgagta tggtgagaca gttgatggtg ttgtcctgga aaaagatatc 60 

cagctggttt atgggaccgc caataatacg aaaatcaatc ctggcggaga acagcatatt 120 

aaagaatttg gtataagtag taatactgaa attaacggcg ggtatcagta cattgaaatg 18 0 

10 aatggcaccg cagaatactc agtattaaat gatggttatc aaattgttca aatgggtggc 240 

gcggcaaacc agactacgct caataatggt gtgctacagg tttatggcgc agcgaatgat 3 00 

cccacgatta aaggcgggcg cttaatcgtt gaaaaagatg ggattaccgt ccttgccgct 360 

atcgaaaagg gaggattact ggaggttaaa gaggggggat tagcgattgc ggtagatcag 420 

aaagcaggcg gtgctattaa agcaagcacg cgggtcatgg aggtattcgg aacaaaccgt 480 

15 ctcggtcagt tcgaaatcaa gaatggtatt gctaacaata tgctgttgga aaacggcgga 540 

agtttgcgag ttgaagaaaa tgacttcgct tataatacta ctgtagatag tggcggctta 600 

ctggaggtta tggatggcgg gactgcaact ggcgttgata aaaaagcagg cggaaaatta 660 

attgtctcaa cgaatgcgct ggaagtgagt ggtacaaaca gtaaaggcca atttagtata 72 0 

aaagatggtg tgtcaaaaaa ttatgaactg gatgatggtt ccgggcttat tgttatggag 780 

20 gacacgcagg ccattgacac tatcctcgat gagcatgcca ctatgcaatc gctgggaaag 840 

gatactggta cgagagtgca ggcaaatgcg gtatatgatc tcggtcgatc agatcagaat 900 

ggaagtataa cgtattcctc taaagccatc tctgaaaata tggttatcaa caatggccgc 960 

gctaacgtct gggctggcac aatggttaac gtgtcagtca gaggaaatga tggcattctt 102 0 

gaggttatga agccgcaaat aaattatgca cccgcaatgt tggtgggtaa ggtagtggtt 1080 

25 tctgagggcg cttctttaag aacgcatggt gccgtggata ccagcaaagc ggatgtttcg 1140 

ctcgaaaata gcgcatggac catcattgcc gatatcacta cgacgaacca aaacacccgc 12 0 0 

cttaacttag ccaaccttgc gatgtctggc gcaaatgtga ttatgatgga tgagtcagtg 12 60 

actcgttcat ctgtgacggc aagtgcggaa aatttcacta cgttgaccac caataccctg 13 20 

tcgggaaacg gcaattttta tatgcgtacc gatatggcga atcatcagag cgatcagctc 13 80 

30 aacgtcaccg gtcaggcaac aggtgatttc aaaatattcg tgacggacac cggtgccagc 1440 

ccggcagcag gagatagcct tacactggta acaacgggcg gcggtgatgc tgcatttacg 1500 

ttgggcaatg ccggaggcgt tgttgatatc ggtacgtatg aatatacctt gctggataat 1560 

ggtaaccata gctggagtct ggcagagaat cgcgcgcaaa ttaccccttc aaccactgat 1620 

gtgctgaata tggcggccgc acaaccgctg gtatttgatg cagaactgga caccgtgcgt 168 0 

35 gagcgtcttg gtagcgtaaa aggcgttagt tacgatacgg cgatgtggag ttcggcaatt 1740 

aacacccgca acaacgtgac cactgatgcg ggagctggtt ttgagcaaac attgacgggc 18 00 

ctgacgctcg gtatcgatag ccgtttctcc cgtgaagaaa gcagcacaat tcgcggcttg 18 60 

tfcctttggtt actctcatt'c tgatattggt tttgatcgcg gcggcaaagg caatgtcgat 1920 

agctataccc tgggggctta tgccggttgg gagcatcaga acggtgccta tgttgatgga 1980 

40 gtggtgaaag ttgaccgttt tgccaacacc atccatggca agatgagtaa tggggcaaca 2040 

gcgtttggcg attacaatag taacggcgcg ggtgctcatg tcgagagcgg gttccgttgg 2100 

gttgacggat tgtggagtgt tagaccctat ctggccttta ccggctttac cacagatggt 2160 

caggactaca cgttatcaaa cggcatgcgc gcggatgtgg gaaatacccg gatattacgc 2220 

gctgaagcgg gaacggcggt aagctatcac atggacctgc aaaacggtac gacgctggaa 22 80 

45 ccctggctga aagccgccgt gcgtcaggaa tacgccgatt ctaaccaggt gaaagttaat 2340 

gacgatggca aatttaataa tgatgtggct ggaacccgtg gcgtttatca ggctgggata 2400 

aggtcatcgt ttaccccgac gttaagcggt catttgtcag tcagctatgg caatggcgca 2460 

ggggtagaat cgccgtggaa tacccaggcg ggtgtggtct ggacgttctg a 2511 

50 <212> Type : DNA 

<211> Length : 2511 

SequenceName : SEQ ID 411 
SequenceDescription : 



55 Sequence 



60 



65 



<213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgcaaagga aaggcaataa actgttgatt 
accacatcct ggtatgcatt ggcgaatgaa 
tatcacatga agataagctc tactcagctt 
acagaaatag ccgaagctac atgggatgta 
tgtaaatctc ttggggatag taaggcagtt 
atatccacgt acaccacaac gaatggcgca 
gtgtattctg tcgagttatt atgccttagt 
ctacctgcac aaagtggcgc agataacttc 
gagtacagtg atcaaagttg gtatttacgt 



COli 0157:H7 

cagttatgca gtgtgatact 
tgttatatag agagaaatgc 
agtctggcgt cacaaatggt 
aatattcaac taagaggcga 
cactttctta atacagctga 
gcgttattaa aaacaactgt 
tgtggtgccg cagatgaact 
ataccaagca cccagacgaa 
tttcgcttat tcataactcc 



gctatttttt 
tgaaggggat 
cgaggttccg 
tgccataggg 
cccaagttta 
tccaggcatt 
tgatttatgg 
atgggcctat 
tgaatttaaa 



60 
120 
180 
240 
300 
360 
420 
480 
540 
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cccaagaatg gtgtttccag cggaacaacg atagcaggaa agattgcgtc atggtatata 600 
. ggtaccaatg accagccgtg gatcaacttt tacattgaca atgactcttt aaagtttttc 660 
gtcgatgaac cgacctgtgc aacagttgcc ctggcacaag atcagggcaa cgtcagtggc 72 0 

aatcaggtaa cgcttgggaa cagctatgtt tcggaagtga aaaatgggct tacgcgggaa 78 0 

5 atcccttttt ctatccgtgc tgaatactgt tatgccagta aaattacggt taagttgaaa 840 
gcggcaaata aacccagcga tgccacactg gtgggtaaaa cgactggctc ggcttcaggc 900 
gtggctgtaa aagtaaattc aacttatgac aatagcaaag tattgttaaa agcagatggt 960 
agcaacacgg ttgactacaa cttcgccgcc tggtcaaaca acctgctgtt tttacctttt 1020 
acggcgcagc tggtaccgga tggtagcggt aatgctgtcg gtgttggaac attttcaggt 1080 
10 aacgcgacct tctcctttac ctacgaataa 1110 
<212> Type : DNA 
<211> Length : 1110 

SequenceName : SEQ ID 412 
SequenceDescription : 

15 

Sequence 



<213> OrganistnName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

20 ttgtaccagt ttactcatca aaaaagccgt atcccgaaaa aaacgctact tgcggcctgt 60 

tgtgccctgt tttatagcag caacggtgct gcggcggaca ccgtggaata tgacagttcc 120 

tttttaatgg gaactggcgc atcaacgatt gatgttaaac gttatgctca aggcaacccg 18 0 

acaccgccgg gtctctataa tgtccgcgta tttgtaaacg gtcaggcgac ttccagctta 24 0 

gaaattccgt ttgtggatat tggcgaaaac agtgcggcgg cctgtcttac ccataaaaac 3 00 

25 ctggcgcaac ttcacattaa gcaacctgaa cagcctgtca ctttactcgc cagagaaggt 360 

gaagaagagg attgtctgga tctggcaaag tcatacgaaa aggcggatgt gtgctttgac 420 

ggtagtgacc agtttctcga tctgacgatc cctcaggcct atgttctgaa aagctatggc 480 

ggctacgttg acccttcttt atgggaatcg ggaattaacg ctgccacact ggcatatacc 540 

ctgaacgcgt atcacacaag ttcagataac gacaatagtg acagcgtcta tggcgcgttc 600 

30 aactcaggta tcaatttagg agcctggcac tttcgtgcgc gcggtaacta taactggaca 660 

acagataacg gcagcgattt cgatttccag gatcgttact tacagcgtga cattccggca 720 

atccgttccc agataattat gggtgatgcc tataccaccg gtgaaacgtt tgactctgtc 780 

aacgtccgtg gtgttcgcct gtacagcgac agccgtatgc tgccttcggc gctggccagt 840 

tacgctccga ccatccgcgg tgtagcaaac tccaacgcca aagtcaccgt gacgcaaagc 900 

35 ggatataaaa tttatgaaac caccgttccg cccggtgaat ttgttataga cgacattagc 960 

ccttccggct ttggtagcga actggtcgtg accattgaag aagcggatgg ttccaaacgc 1020 

acctttacgc aacccttctc gtcggttgta caaatgcaac gtcctggtgt gggccgttgg 108 0 

gatttcagcg cgggtaaagt cattgatgac agtctgcgat ccgaacccaa tatggggcaa 1140 

gcctcttatt actatggtct gaataacctc ttcacgggtt ataccggcat tcagttcacc 12 00 

40 gataataact atcttgccgg gctgttaggt gtgggtatca acaccagcat cggcgccttt 1260 

gcggtagacg ttacccattc ccgtgctgaa attccggatg ataaaaccta ccaggggcaa 1320 

agttatcgcg tgacctggaa caaacttttc caggataccg ggacatcatt taacctcgcg 13 80 

gcgtaccgct attccaccca ggattacctg ggcctgcatg atgcgttagt cctcattgac 1440 

gacgccaagc atttgtctgc cgatgaagac aaaaacacca tgcagacgta ctcacgtatg 1500 

45 aaaaaccagt ttaccgtcag cattaaccag ccattgaata tcgcctatga agattacggt 1560 

tcgctgttta tttccggtag ctggacgtat tactgggcgg cgaacaatag ccgcactgaa 1620 

tataatgttg gttacagtaa aagcgtttcg tggggcagtt tcagcgtcaa cctacaacgt 1680 

agctggaatg aagacggcga gaaagatgac gcgatgtacg tcagcgttag cgtacctatt 1740 

gagaatattt taggtggcaa acgtaagtct tctggtttcc gcaatttaaa tactcagctc 1800 

50 aataccgatt tcgatggttc acatcagttg aatgttaaca gttccggtaa cactgaaaac 1860 

aatctggtga actacagtgt caacgcaggt tatagcctcg ataaaaacgc cggcgattta 1920 

gcctctgttg gtggttatct caactatgaa tctgggttag gcggtatttc cgcttcggcc 1980 

tcggccactt ctgataacag ccaacagtac tccatctcaa ccgatggcgg ctttgtatta 2 040 

cacagtggtg gtttaacgtt cactaacaac agtttcagca gtaacgacac gctggtgtta 2100 

55 atcaacgccc taggtgctaa aggcgcacga atcaataaca gtaataacga aatcgatcgc 2160 

tggggatatg ccgtgacgtc ctctgtcagc ccatatcgtg aaaaccgggt aggtctgaac 2220 

attgaaacac tggaaaacga tgttgaactg aaaagtacca gcgccaccac cgtaccacgt 22 80 

agcggctccg ttgttttgac ccgtttcgaa actgacgagg ggcgttctgc cgtgctgaat 2340 

attactgccg ccaatggcaa atccattccg tttgctgcgg aggtttacca gggtgaggtg 2400 

60 atgatcggca gcatgggcca gggtggtcag gcatttgtac gcggtattaa cgacagcggg 2460 

gaattaatcg tgcgctggta tgaaaacaac caaaccattg actgtaagtt gcactaccag 2 52 0 

ttcccggcgc agccacaaac gcagggaagc accaacacct tattacttaa caatcttacc 2580 

tgtcaggtag caaatcacta a 2601 
<212> Type : DNA 

65 <211> Length : 2601 

SequenceName : SEQ ID 413 
SequenceDescription : 
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Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
5 <4 00> PreSequenceString : 

atgaagttca aacgattgct gcatagcggc atcgccagtt tgagtctggt tgcctgcggg 60 

gtgaatgcgg cgacggatct tggcccggca ggggatattc atttctccat cactatcacc 120 

actaaagctt gcgagatgga aaaaagcgat ctcgaagtcg atatgggaac aatgacgctg 180 

caaaaacctg cggcagtcgg tacggtgttg agcaagaaag atttcaccat tgaactcaaa 24 0 

10 gagtgcgatg ggatatccaa agcgaccgtt gagatggaca gtcagtcgga cagcgatgat 3 00 

gattccatgt ttgcccttga ggctggtggc gcaacgggtg ttgcgttgaa gatagaggac 360 

gataaaggaa cgcagcaagt tcccaaaggc tccagcggaa cgccgattga atgggcgatt 420 

gatggcgaaa ccacgtcgct tcactaccag gcgagttatg tggtcgtcaa cactcaggcc 480 

actggtggca cagcgaatgc ccttgtaaat ttttccatca cctatgagta a 531 



15 

<212> Type : DNA 

<211> Length : 531 

SequenceName : SEQ ID 414 
SequenceDescription : 

20 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

25 atgaaataca ataacattat tttcctcggt ttatgtctgg ggttaaccac ctattctgct 60 

ttatccgcag atagcgttat taaaattagc gggcgcgtcc tcgattatgg ctgcacagtc 120 

tcatcggatt cgcttaattt taccgtagat ctccaaaaaa acagtgccag acaatttcca 180 

acgaccggta gcacaagtcc agccgtccct tttcagatta cgttaagtga atgcagcaaa 240 

gggacaacgg gggttcgggt tgcatttaac ggtattgagg acgcagaaaa taatactctg 3 00 

30 ttgaaactgg atgagggaag caatacggcc tccggtttag gtatagaaat actggacgga 3 60 

aatatgcgtc cggtgaaact gaatgacctt catgccggga tgcagtggat cccactggta 42 0 

ccagaacaga acaatatttt gccttactcc gctcgtctga agtcaactca gaagtccgtc 480 

aatccgggac tggtgagggc ttcggcaacc tttacccttg aatttcaata a 531 



35 <212> Type - DNA 

<211> Length : 531 

SequenceName : SEQ ID 415 
SequenceDescription : 

40 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgaaatggc gcaaacgtgg gtatttattg gcggcaatat tggcgctcgc aagtgcgacg 60 

45 atacaggcag ccgatgtcac catcacggtg aacggtaagg tcgtcgccaa accgtgcaca 120 

gtttccacca ccaatgccac ggttgatctc ggcgatcttt attctttcag tctgatgtct 180 

gccggggcgg catcggcctg gcatgatgtt gcgcttgagt tgactaattg tccggtggga 240 

acgtcaaggg tcactgccag cttcagcggg gcagccgaca gtaccggata ttataaaaac 3 00 

caggggaccg cgcaaaacat ccagttagag ctacaggatg acagtggcaa cacattgaat 3 60 

50 actggcgcaa ccaaaacagt tcaggtggat gattcctcac aatcagcgca cttcccgtta 420 

caggtcagag cattgacggt aaatggcgga gccactcagg gaaccattca ggcagtgatt 480 

agcatcacct atacctacag ctga 504 
<212> Type : DNA 
<211> Length : 504 
55 SequenceName : SEQ ID 416 
SequenceDescription : 

Sequence 



60 <213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgaaaagag cgcctcttat aacaggactt ttgttgatat ccacatcctg cgcttatgcc 60 

tcctcagaag ggtgtggagc tgacagcact agcggtgcga caaattacag cagtgtggtt 120 

gatgatgtta cggtgaacca gacagataac gtgacaggac gggagtttac ctctgcaacg 180 

65 ctaagtagca ctaactggca atacgcctgt tcctgctctg cgggtaaggc agttaaactt 240 

gtctatatgg tcagccccgt acttaccacc actggacatc agacaggata ttacaaactc 3 00 

aatgacagcc tggatattaa aaccatgaac cgccccggaa atcctggaga ctaa 3 54 
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<212> Type : DNA 

<211> Length : 354 

SequenceName : SEQ ID 417 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

10 <400> 'PreSequenceString : 

atgaaaaaag cacttctcgc agccgctctg gttatggctt ctggttccgc cctggctgta 60 

gatggtggtc atatcgactt taacggtatg gtacagtccg gtacctgtaa agtgggtgtg 120 

gtagatactg gtatgcatag cgttaccact gatggcgtgg ttaccctgga tactgcgaat 180 

gttactgata cttttgctga agttagcgca actgctgtcg gtttactgcc gaaagagttc 24 0 

15 atgatttctg fctgagtgtga tccaggtgct ccgaagaatg ctgagttaac tatgggttct 3 00 

gcaagttacg cgaacaccag cggtaccctg aataacaata tgaacatcac tgttaacggt 3 60 

attgcaccgg ctcagaacgt aaacattgca gttcataaca tgaaaaacaa agctggcgct 420 

gctgaaatta agcaggtcca tatgaacaac tcttctgaag ttcaggaact gacattagac 4 80 

gcagaaggta aaggccagta cgtafcttaac gcatcttacg ttaaagcacc gaacagcccg 540 

20 gctgtaactg ctggtcatgt aaccactaac gcgctgtaca ccgttgctta taagtaa 597 

<212> Type : DNA 
<211> Length : 597 

SequenceName : SEQ ID 418 
25 SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 

30 <40 0> PreSequenceString : 

atgaaaccaa atatgattgt aggagcatta gcgttaactt ctgtgtttat ggcaggtcac 60 
ctacaggcgg ctgatggaac agtccatttc cgtggtgaaa ttattgacag tacttgcgaa 12 0 

gtcactcctg aaactaaaga tcaggtcgtt gatttaggca aagtaaaccg tacagccttt 180 
agtggcgtcg atgatgtggc tgccccgacg gctttttcta tcgatctgac tcaatgcccg 240 

35 gaaaccttta agtccgccgc aattcgtttc gatggtaatg aagatgctca tggtaatggc 3 00 

aacctggcaa ttggtacccc gctggataac tctaacgatg ctgccgctgg tattagcccg 3 60 

agtgataaca gtggggatta tactggtgcg ggtgccgtta gtgcagcgaa aggcgtagct 420 
attcgtttat ataaccgtgc agataacact caggtcaagt tatatgaaaa ttctgcatca 480 
actccgattt ctaatggtaa tgcatccatg aagttcatgg ctcgttatat tgctacggaa 540 

40 acgactattg accctggtac agctaacgcc gactcgcagt ttacagttga atatataaaa 600 
taa 603 
<212> Type : DNA 
<211> Length : 603 

SequenceName : SEQ ID 419 

45 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

50 <400> PreSequenceString : 

gtgccaattt tccagcgtga aggccatctc aaatatagct ttgccgcagg tgaatatcag 60 

gccgggaatt atgacagcgc ctcgccgcgt ttcgggcagc ttgatctgat ctacggttta 120 

ccgtggggga tgacggccta cggcggcgta ttaatctcta ataattacaa tgcatttaca 180 

ttagggatag ggaaaaactt tggttatatc ggggcgattt ccattgatgt gacgcaggct 240 

55 aaaagcgaac tgaataacga tcgcgatagc cagggacaat cttatcgttt cttatattcc 3 00 

aagagcttcg aaagcggcac cgatttccgc cttgcgggct atcggtactc taccagcggt 3 60 

ttctatacct tccaggaagc caccgatgtg cgcagtgacg ctgacagcga ctataaccgt 42 0 

tatcacaagc gcagcgaaat acagggtaac ctgacgcagc aattaggggc ctatggctct 48 O 

gtttatttaa atttaacgca gcaggattac tggaacgacg caggtaaaca gaacacggta 540 

60 tcggcgggtt acaacggacg tattggcaag gtcagttaca gtattgcata tagctggaat 600 

aaaagccctg aatgggatga aagcgatcgc ttgtggtctt tcaatatttc cgttccacta 660 

ggccgggcct ggagtaacta tcgcgtcacg accgaccagg atggtcgtac caatcaacag 720 

gttggggtca gcggaacgct gcttgaggat cgcaacctga gctacagtgt ccaggaaggc 780 

tacgccagca acggtgtggg taacagcggt aacgctaacg ttggctatca gggtgggtcc 840 

65 ggtaatgtca acgtaggcta tagctacggg aaagattacc ggcagctcaa ctacagcgtt 900 

cgcggcggcg tgatagttca tagcgaaggc gtgacgcttt cccaaccgct aggcgaaacc 960 

atgacgctca tctccgtacc cggtgcgcgc aatgcccgcg tggtgaataa cggcggcgtt 102 0 
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caggttgact ggatgggtaa cgcgatcgtg ccttatgcca tgccgtatcg tgaaaacgaa 1080 

atctcactgc gtagcgattc gttgggtgac gatgttgacg ttgaaaatgc gttccagaaa 2.140 

gtggtgccaa cgcgtggagc gattgtcaga gcgcgttttg atacccgcgt tggttaccgc 1200 

gtattaatga cgctgcttcg ttccgcgggc agcccggtgc cctttggagc aacggcaacg 12 60 

5 ctaatcaccg ataaacaaaa cgaggtgagc agtatcgttg gtgaagaagg acagctctat 1320 

attagcggaa tgccagagga aggacgggta ttgattaaat ggggtaatga cgcgtcgcag 13 80 

caatgcgtgg cgccttataa attatccctg gaattaaaac agggcggaat tattcctgtt 1440 

tcggccaatt gccagtaa 1458 
<212> Type : DNA 
10 <211> Length : 1458 

SequenceName : SEQ ID 42 0 

SequenceDescription : 



Sequence 
15 

<213> OrganismName : Escherichia 
<400> PreSequenceString : 
atgagtggtt acaccgtcaa gcctcctacc 
gattatttta atctgttcta cagtaagcgt 

20 cttggaaatt acggtgcgac atttttcagt 
cgcagcgacc agcaaatatc atttggatta 
ctgaattaca gctattccaa taatatatgg 
acgcttaatg ttcccttcag tcattggatg 
tcaaacgcca gttacagtat gtcaaacgat 

25 gtttatggca ctctgctgcc ggataataac 
cacggaggta atacatcgtc tggcaccagt 
tacggcaata ctaatgtcgg ttacagtcgg 
atgagtggtg ggattattgc tcatgctgat 
acaatggttc tggttaaggc tcctggcgct 

30 attcataccg actggcgtgg ctatgccata 
cgtgtcgctc ttaacgcgaa ttcccttgca 
actgtcatcc caactcacgg tgctattgcc 
aaagtattaa tgacgttgaa gtacggtaat 
cacggagaga ataaaaatgg cagcattgtc 

35 cttccacagt cagggaaatt acaggtttca 
gtcgattaca agcttcctga agtctctcct 
tgtcgctaa 
<212> Type : DNA 
<211> Length : 1149 

40 SequenceName r SEQ ID 421 

SequenceDescription : 

Sequence 



coli 0157:H7 

ggagacagca atgagcagac acaatttatt 60 

gatcaggaac aaataagcat ctctcagcag 120 

gccagtcgcc aaagttactg gaacacgtca 180 

aatgtgccgt ttggtgatat tacgacttcg 240 

caaaacgatc gggatcattt actcgctttt 300 

cgtacagaca gtcagtcggc atttcgtaat 360 

ttgaaaggcg gcatgaccaa tctatcgggg 420 

ctgaattata gcgttcaggt cggtaacacc 480 

ggttacagta ctcttaatta tcgtggagct 540 

agtggtgaca gcagccagat ttattacgga 600 

ggcatcacct ttggacagcc gctgggcgac 660 

gataatgtca aaatagagaa ccagaccgga 72 0 

ttaccatttg cgacagaata tagagaaaat 780 

gataatgttg aactggatga aaccgtggtc 840 

agagcaacat ttaatgcaca aatcggcggg 900 

aaaagcgttc cattcggtgc aattgtcact 960 

gcggaaaacg gtcaggttta tctgactgga 1020 

tggggcaatg ataaaaactc aaactgtatt 1080 

ggaaccttgc tgaaccagca gacagcaatc 1140 

1149 



45 <213> OrganismName : Escherichia 
<4 00> PreSequenceString : 
atgtctgctt tgtatgaacg ttcacagctg 
actgctgaaa ccatggagaa ggcggaatat 
cagttcaccg ccggtcagaa acaggatatt 

50 gagaacatca acggtctggg ggcgtcgtcc 
aatcaggccc agaacgccct gcgtgatgcc 
gtgcagtttc cgtccggtaa gggctttaag 
tcatccggta ccaacggcgt ggtggctgca 
gtgtcctatg tggtaccgct ggcgtttgtg 

55 accggtgcgc tgctgacaat gtcagtcagt 
gcctggaaga aggatggtca gccggtagag 
ggtgcgcagt caggtgataa gggggcttat 
ccgcagagca ttacctctga tgcgtgtaca 

60 <212> Type : DNA 

<211> Length : 717 

SequenceName : SEQ ID 422 
SequenceDescription : 



coli 0157:H7 

acgcaggtga tgatttcatc tgccccggcg 60 

ctgcgcctgg actgcaccat caaggaagtc 120 

gatgtgacca cgctctgctc cacagagcag 180 

gagatttcca tgtcgggtaa tttttatctg 240 

tatgacaatg acacggtgta tgcgtttaag 3 00 

ttcctggcgg aagtgcgtca gcacacctgg 3 60 

acgttttcac ttcgcctgaa gggtaaaccg 420 

aaaaatctgg ataagacact taccgtgaat 4 80 

gtcaacgggg gaacgccgcc ttataaacac 540 

ggacagacta ctgacacttt cagtaagcca 600 

acctgcgagg taacggattc tgcagaacag 660 

gtaacggtta atggtgcggg cggataa 717 



65 Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 



WO 2005/076010 



136/341 



PCT/IN2005/000037 



10 



<4 00> PreSequenceString : 

atgagaaaca aaccttttta tcttctgtgc gcttttttgt ggctggcagt gagtcacgct 60 

ttggctgcgg atagcacgat tactatccgc ggctatgtca gagataacgg ctgtagtgtg 120 

gccgctgaat caaccaattt tactgttgat ctgatggaaa acgcggcgaa gcaatttaac 180 

aacattggcg cgacgactcc tgtcgttcca tttcgtattt tgctgtcacc ctgtggtaac 240 

gccgtttctg ccgtaaaagt tgggtttacc ggcgttgcag atagccacaa tgccaacctg 30 0 

cttgcacttg aaaatacggt gtcagcggct tcgggactgg gaatacagct tctgaatgag 3 60 

cagcaaaatc agatacccct taatgctcca tcgtccgcga tttcgtggac gaccctgacg 420 

ccgggtaaac caaatacgtt gaatttttac gcccggctaa tggcgacaca ggtgcctgtc 48 0 

actgcggggc atatcaatgc cacggctacc ttcactcttg aatatcagta a 531 



15 



<212> Type : DNA 

<211> Length : 531 

SequenceName ; SEQ ID 423 
SequenceDescription : 



Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 

20 <400> PreSequenceString : 

atgaataaat ccgttgtgtc aatttctgcg gcaatgttgg 
atggggagcg aaatctcacc cgcaacaccg tcagatgaag 
caactcttcc gcggcagcag atttagtcag tcgtcattag 
tctgttgcac cgggcaatta taaaatggat atctacacca 

25 tggaatgtca cgtttaaaga agccgctgat ggtcgcgttc 
gtcgcggacg cgataggcct caaaacaggg gaagataagg 
acgtttgcta aggaactcgc tcccggcatc accagccaga 
ctggacttat cggtgccaca gagtcaattg attagtcgcc 
agcgagctgg ataccggagc atcgctggcg ttcatgaatt 

30 gttgcctatt cagggcagaa tgctcatagc cagcgttcgc 
ggcatcaacc ttggtgcctg gcaatatcgt cagttatcca 
aaagggaatc agtggaacaa tattcgtagc tatttgcaac 
agccagttaa tgatggggca gcttatcacc agcggaagat 
cacggcgtta gtcfccgcgac cgatgaacgt atgctgccgg 

35 ccgactattc gcggcgtggc cgcaacaaac gccagagtct 
gaaatatatc agaccaccgt ggctcctggc cctttcgaga 
agctacagcg gcgatctgga tgtcaccgtt acggaagcta 
agtgtcccct tttcagccgt accagaatcg atgcgtccag 
gaagtaggta aaacgcagga tagtggtgat gactcgatgt 

40 cacgggatga ctaatacgct gacatttaac agtggttcgc 
gcgctgatgc tgggcggagt ctatggcagt tcgctggggg 
tggtcccatg cgcgtgttcc cgaaagcgaa gcgcagagtg 
tggagtaaaa ctttccagcc tacttcaacc accgtctccc 
accagcggct atcgtgatct ggctgatgtg ctgggagagc 

45 cagtcatggg actccagcca gtggcgtcaa cagtcgcgct 
agccttgcga attacggcaa cctgtttgtg tcaggttcaa 
aagagccgtg atacacagct tcagttaggt tacagcaata 
atgaaccttt ccgtcggacg ccaaagaatg ggcggctata 
cagacggtaa catccctttc attctcattc ccacttggcg 

50 agtcttagca acagctggac ccattcaact gacggtagct 
accggaatgc ttgatgaagc acagaccacc aactacagcc 
caatataagc agacgacgct tagcggaaac atgcaaaaac 
ggattgaacg catcgaaggg ccaggattac tggcaggctt 
atggctgtgc atggtggcgg cattactttc ggaccttatc 

55 gtcgaagcta aaggcgcaga aggtgcaaaa gtctataact 
gacagtggct atgcgcttgt tccggcagta acgccctatc 
gatccacaag gaatggatgg cgatgccgag ttggtcgaca 
gttgcgggtg cggcggtgaa agtaattttc cgtacccgtc 
aaatcccgca tggcagatgg ttcggaactg ccaatgggag 

60 aatacagtcg tcggtatagc cggtcagggg gggcaaattt 
aaaggccact tgtcagttcg ctggggtgaa ggtgctaacg 
gatatcagcg ggaaggacag caatagccct atcatccgcc 
tga 

<212> Type : DNA 
65 <211> Length : 2523 

SequenceName : SEQ ID 424 
SequenceDescription : 



ttttactttg 
acaactacac 
caaaactgac 
acaataagtt 
tgccctgcct 
gggaaaaaga 
cacagttgtc 
ctcgcggcta 
atattgccaa 
tatgggcatc 
acatgacctg 
gcccgctgcc 
ttttctctgg 
actccatgcg 
cggtaatgca 
taaacgacct 
acggcgcagt 
gaacttcccg 
ttggtgacct 
gtatcgctga 
catttggggc 
gttggatgtc 
tggcaggtta 
gtcatgctgc 
tcgatcttac 
cacagaacta 
gctttagcca 
aagacaattc 
gcaatggacc 
cgcaattaca 
tgaacgtcat 
gtttttcaca 
caggtaacgt 
tgggtgaaac 
ccagtcagct 
gctacaaccg 
gtgaaagaca 
ctggtaaagc 
ccgatgtgct 
acctccgcac 
atagctgcca 
tgaatgaaac 



ccaaccggtc 

ctttgacccg 

aacacgtgag 

gtcaggcagt 

gacgcctgaa 

tcctgtctgt 

acaattgcgc 

tgttcccccc 

ctattacaac 

atttaatggt 

ggataatgac 

cgccataaat 

actcagttat 

cggctatgcg 

aaacggtcat 

ataccccacc 

cagtcgtttc 

ttataacgtg 

tacctggcag 

tggctaccag 

aaacctcact 

gcaattaacc 

tcgatactct 

cagcaataaa 

gttaagtcag 

ccgtggcggc 

tggcatcagt 

tgatgatatg 

tcgtgtacca 

aagctcgcta 

gcgcgatcaa 

aactaccgtc 

acaaggcgcg 

gttcgccctg 

ggaaattaat 

tatatctctc 

ggtagcaccg 

gttgctgatt 

ggatgagaat 

agaacagaca 

attgcccttt 

ctgtcagtct 



60 
12 0 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2523 
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Sequence 



<213> Organi smName : Escherichia coli 0157 :H7 
5 <40 0> PreSequenceString : 

atgaaactcg ccgcctgttt tctgacactc cttcctggct tcgccgttgc cgccagctgg SO 

acttctccgg ggttccctgc ctttagcgaa cagggaacgg gaacatttgt cagccacgcg 120 

cagttgccca aaggtacgcg tccactcacg ctaaattttg accagcagtg ctggcagcct 180 

gcagatgcga taaaactcaa tcagatgctt tccctgcaac cttgtagcaa cacgccgcct 240 

10 caatggcgat tgttcaggga cggcaaatat acgctgcaaa tagaoacccg ctccggtacg 300 

ccaacattga tgatttccat ccagaacgcc gccgaaccgg tagcaaacct ggtccgtgaa 3 60 

tgcccgaaat gggatggatt accgctcacg ctggatgtca gcgccacttt cccggaagga 42 0 

gccgccgtac gggattatta cagccagcaa attgcgatag tgaagaacgg tcaaataacg 480 

ttacaacccg ctgctaccag caacggttta. ctcctgctgg aacgggcaga aactgacgcc 540 

15 tctgcccctt tcgactggca taacgccacg gtttactttg tgctgacaga tcgtttcgaa 600 

aacggcgatc ccagtaatga ccagagttac ggacgtcata aagacggtat ggcggaaatt 660 

ggcacttttc acggcggcga tttacgcggc ctgaccaaca aactggatta cctccagcag 720 

ttgggcgtta atgctttatg gataagcgcc ccatttgagc aaattcacgg ctgggtcggc 780 

ggcggtacaa aaggcgattt cccgcattat gcctaccacg gttattacac acaggactgg 840 

20 acgaatcttg atgccaatat gggcaacgaa gccgatctac ggacgctggt tgatagcgca 900 

catcagcgcg gtattcgtat tctctttgat gtcgtgatga accacaccgg ctatgccacg 960 

ctggcggata tgcaggagta tcagtttggc gcgttatatc tttctggtga cgaagtgaaa 1020 

aaaacgctgg gtgaacgctg gagcgactgg aaacctgccg ccgggcaaac ctggcatagc 10 80 

tttaacgatt acattaattt cagcgacaaa acaggctggg ataaatggtg gggaaaaaac 1140 

25 tggatccgta ccgatatcgg cgattacgac aatcctggat tcgacgatct caccatgtcg 1200 

ctagcctttt tgccggatat caaaaccgaa tcaactaccg cttctggtct gccggtgttc 1260 

tataaaaaca aaacggatac ccacgctaaa gccatcgacg gctttacccc tcgcgattac 13 2 0 

ttaacccact ggttaagtca gtgggtccgc gactatggga ttgatggttt tcgggtcgat 13 80 

accgccaaac atgttgagtt gcccgcttgg cagcaactga aaaccgaagc cagcgccgcg 1440 

30 cttcgcgaat ggaaaaaagc taaccccgac aaagcattag atgacaaacc tttctggatg 1500 

accggtgaag cctggggcca cggcgtgatg caaagtgact actatcgcca cggcttcgat 1560 

gcgatgatca atttcgatta tcaggagcag gcggcgaaag ctgtcgattg tattgcgcag 1620 

atggatacga cctggcagca aatggcggag aaattgcagg gtttcaacgt gttgagctac 1680 

ctctcgtcgc atgatacccg tctgttccgt gaagggggcg acaaagcagc agagttatta 1740 

35 ctattagcgc caggcgcggt acaaatcttt tatggcgatg aatcctcgcg tccgttcggt 1800 

cctacaggtt ctgatccgct gcaaggtaca cgttcggata tgaactggca ggatgttagc 1860 

ggtaaatctg ccgccaacgt cgcgcactgg cagaaaatca gccagttccg cgcccgccat 192 0 

cccgcaattg gcgcgggcaa acaaacgaca ctttcgctga agcagggcta cggctttgtt 1980 

cgtgagcatg gcgacgataa agtgctggtc atctgggctg ggcaacagtg a 2031 

40 

<212> Type : DMA 
<211> Length : 2031 

SequenceName : SEQ ID 42 5 

SequenceDe script ion : 

45 

Sequence 



<213> Organi smName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

50 atgccacaac gacaccacca gggacataaa cgcacaccga aacagttggc gctcattatc 60 

aaacgctgtt tgccgatggt gctcactggc agcggcatgc tttgcactac cgctaacgcc 120 

gaagagtatt atttcgaccc cattatgctg gaaaccacaa aaagtggtat gcaaacaacc 180 

gatctgtcac gtttttcaaa aaaatacgca caactaccag gaacttatca ggttgatatc 240 

tggctgaata aaaagaaggt ttcacagaaa aaaattacat ttaccgccaa tgcagagcaa 3 00 

55 cttctgcagc cacagtttac ggtagaacaa ctacgtgagc tgggtattaa ggtggatgaa 360 

atcccggcgc tggctgaaaa agatgacgat agcgtgatca actcgcttga acaaatcatt 420 

cccggtacag ctgctgaatt tgatttcaat catcagcgac ttaatttgag cattccccaa 480 

attgcactgt accgtgatgc aagaggttac gtctcccctt ctcgttggga cgatggtata 540 

ccaacgctgt ttaccaacta ctcgtttaca ggttctgata accgttaccg ccagggcaat 600 

60 cgtagccaac gacagtacct aaatatgcaa aatggtgcca attttggccc ctggcgatta 660 

cgtaactatt ctacgtggac acgcaacgat caggcgtcaa gctggaacac tatcagtagt 720 

tatttacaac gtgatatcaa ggcgttgaag tctcagttgc ttctgggaga aagcgccacc 780 

agcggcagta ttttttccag ctacaacttt actggcgtgc aactcgcttc cgacgataat 840 

atgttgccaa acagccagcg cggatttgcc ccaacggtac gcggtatcgc aaacagtagt 900 

65 gcaatcgtga ctatcaggca aaatggttat gtgatctatc aaagcaacgt gccagcgggt 960 

gcctttgaaa ttaacgatct ctacccctct tccaacagcg gcgatttaga agtcacgatt 1020 

gaagaaagtg acggtacgca acgtcgcttt atccagcctt attcttcatt acccatgatg 1080 
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cagcgacctg ggcatctaaa atatagcgcg accgctggac gctatcgcgc tgatgcaaac 1140 

agtgatagca aggaacccga atttgctgaa gccacggcaa tatatggttt gaataatact 120 0 

tttacgctgt atggcggcct gctcggttct gaagattatt atgcgctggg gatcggtatc 1260 

ggcggcacac ttggcgcact gggcgcgttg tcgatggata tcaacagagc tgacacccaa 1320 

5 ttcgataacc agcactcttt tcatggctat caatggcgta cgcagtacat caaagatatc 138 0 

ccggaaacca acaccaatat cgctgtcagc tactatcgct ataccaacga tggctatttt 1440 

agttttgatg aagccaatac ccgcaattgg gactataaca gtcgccaaaa aagtgaaatt 1500 

caattcaaca tcagccagac aatafottgat ggggtaagtc tgtatgcctc cggttcacag 1560 

caagactatt ggggcaataa cgagaaaaac aggaatatct ctgttggggt ttccggccag 162 0 

10 caatggggaa ttggttacag cctgaattat caatacagcc gctacactga tcaaaataat 1680 

gaccgcgcac tctctttgaa tctcagtatt ccgttagaac gctggttacc gcgtagccgg 1740 

gtttcctatc agatgaccag ccagaaagat cgcccaaccc aacatgaaat gcgtcttgat 180 0 

ggctcactgc tggatgatgg tcgcctgagc tatagtctgg aacaaagtct ggatgacgat 1860 

. 1 aacaaccata acagtagcgt gaacgccagt taccgttcac cttatggaac ctfccagtgcc 1920 

15 ggatacagtt acggtaatga cagtagccaa tacaattacg gcgttaccgg cggcgtggtt 198 0 

atccatcctc atggtgtgac gctctcgcaa tatctgggca acgcttttgc gcttattgat 2 040 

gctaacgggg catctggcgt gaggatacaa aactatccgg ggattgctac tgatcccttt 2100 

ggctatgcag tggttcctta tctcacaact tatcaggaaa accgtctctc ggtagatact 2160 

acgcagctgc ccgataacgt cgatcttgaa caaacaacac agtttgtggt gcccaacaga 222 0 

20 ggtgcaatgg tagcggcgcg tttcaacgcc aatatcggtt atcgcgtact tgttacagtc 22 80 

agcgatcgca acggtaaacc gttgcccttt ggcgctcttg ccagcaacga tgatacgggg 2340 

caacaaagta tcgtcgatga gggcggcata ctatatctct ctgggatatc gagtaaatca 240 0 

caaagctgga ctgtacgctg gggaaatcag gcagatcaac aatgtcagtt tgcttttagt 2460 

acaccggatt cagaaccaac aacctctgta ttacaaggca cagcgcagtg ccattaa 2 517 

25 

<212> Type : DNA 
<211> Length : 2517 

SequenceName : SEQ ID 426 

SequenceDescription : 

30 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

35 atgatgttca gaaatagaat attactaata tttatattgt gggctaattt tacctgggct 60 

gggtgtcgta ctactgcatc attaaatatt acagatggta ttaatgttgg ggagatttta 12 0 

gcgaatgaaa cttcctttag taaaagtgtc gtgtttactg ggatatcttg tgatacgagc 180 

acggataaaa tagtttataa aaatatccaa agtgattggg ttgaagttgg gccttttggt 240 

aatggcgaaa aattaaaggt taaaatagag tctttaggta aaaccagcga cacaattggg 300 

40 aaatccagca atgcgcaggc agtattacct tatgtggtta aaatagccag aggcacacct 3 60 

gattttactg gagaaagaaa atctacctgg tttatttcag ataccgtgat tgcaaatatt 42 0 

ggcggtgagt catcgtcatc catcgatttt tggttgggta tttgtaaggc attgaagttt 480 

aactggtgtg tgaattatct. caccagcaaa ctggcggggg atacatttac gcttgggtta 540 

aatatttcct attatcctaa aaatacgacc tgtaagcctg aaaacaccgt tataaaagta 600 

45 gatgatatcg ccttgttcca gctcagaaat cagggaaaga ttgcggcgaa cagtaaggaa 660 

ggaacaatta cgttgaaatg tgataatctt ttcggcgaca aaaaacaagc atcgcggaat 720 

atggttgtat atctttctag cagtgactta gttaaaggaa gtaatactat tttgcgtggt 780 

aaaacagata atggtgtagg gtttgtgttg gatctaacag aaccaccaaa agggactgag 840 

gctgccatta aaatttcggc caacggcgat cagggcgcgg cgacatcatt atggaaaaca 900 

50 gataaaccag gagtttcatt aaatagcaac attattaata taccagtcat ggccagttac 960 

tatgtatatg atgaaaaaaa agttaaatct ggcgcactgg aagcaaccgc attaatcaac 102 0 

gtgaaatacg attaa 103 5 
<212> Type : DNA 
<211> Length : 1035 

55 SequenceName : SEQ ID 427 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgattaaaa aagcttcgct gctgacggcg tgttctgtca cagccttttc cgcttgggca 60 

caggatacca gcccggatac tctcgtcgtt actgctaacc gttttgaaca gccgcgcagc 12 0 

actgtgcttg caccaaccac cgttgtgacg cgtcaggata tcgaccgctg gcagtcgacc 18 0 

65 tcggttaatg atgtgctgcg ccgtcttccg ggcgtcgata tcacccaaaa cggcggttca 240 

ggtcagctct catctatttt tattcgcggt acaaatgcca gtcatgtgtt ggtgttaatt 3 00 

gatggcgtac gcctgaatct ggcggggggg agtggttctg ccgaccttag ccagttccct 360 
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10 



15 



20 



25 



30 



35 



40 



45 



attgcgcttg tccagcgtgt tgaatatatc 
gatgcaatag gtggggtggt gaatatcatc 
tcagcagggt ggggaagcaa tagttatcaa 
ggggataaga cacgagtaac gttgttgggc 
gttgcctatg gtaataccgg aacgcaagcg 
acgctttatg gcgcgctgga gcataacttt 
tatggctatg ataaccgtac caattatgac 
gatacccgta aactctatag tcaaagttgg 
attaaatcac aactcattac cagctatagc 
tatggtcgtt atgattcgtc ggcgacgctc 
gcaaacaaca tcatcattgg ccacggtaat 
agcacggcac cgggcacagc ttatgttaag 
tatctgaccg ggctgcaaca agtcggcgat 
gataactcac agtttggtcg .tcatggaacc 
gaaggttatc gcttcattgc ttcctacggg 
ctgtatggct tctacggaaa tccgaatctg 
gcgtttgaag gcttaaccgc tggggtgaac 
agtgacttga tcgattatga tgatcacacc 
attaagggcg tcgaggcgac cgccaatttt 
tatgattatg tcgatgcgcg caatgcaatt 
cagcaggtga aataccagct cgactggcag 
cagtatttag gcactcgcta tgataaggat 
atgggcggtg tgagcttgtg ggatcttgcg 
gttcgtggta aaatagccaa cctgttcgac 
actgcaggac gggaatacac cttgtctggc 
<212> Type : DNA 
<211> Length : 1845 

SequenceName : SEQ ID 428 
SequenceDescription : 

Sequence 



cgtgggccac 
acgacgcgcg 
aactatgatg 
gattatgccc 
cagccagata 
actgatgcct 
gcgtattatt 
gacgccgggc 
catagcaaag 
gatgagatga 
gttggtgcgg 
gatggatatg 
tttacctttg 
tggcaaacca 
acatcttata 
gacccggaga 
tggcgtattt 
ctgaaatatt 
gataccggac 
accgacacgc 
ttgtatgact 
tactcatctt 
gttgcgtatc 
aaagattatg 
agctacacct 



gctccgccgt 


ttatggttcc 


420 


atgaacccgg 


aacggaaatt 


480 


tctctacaca 


gcaacaactg 


540 


atactcatgg 


ttatgatgtt 


600 


acgatggttt 


tttaagtaaa 


660 


ggagcggctt 


tgtgcgcggc 


720 


ctccgggttc 


accattggtc 


780 


tgcgatataa 


cggcgaactg 


840 


attacaacta 


cgatccccat 


900 


agcaatacac 


cgtccagtgg 


960 


gtgttgactg 


gcagaagcag 


1020 


atcaacgtaa 


taccggcatc 


1080 


aaggcgcagc 


acgcagcgac 


1140 


gcgccggttg 


ggaattcatc 


1200 


aggcaccaaa 


tctggggcaa 


1260 


aaagcaaaca 


gtgggaaggc 


1320 


ccggatatcg 


taacgatgtc 


1380 


acaacgaagg 


gaaagcgcgg 


1440 


cactgacgca 


tactgtgagt 


1500 


cgttgttacg 


ccgtgctaaa 


1560 


tcgactgggg 


tattacttat 


1620 


atccttatca 


aaccgttaaa 


1680 


cggtcacctc 


tcacctgaca 


1740 


agacagtcta 


tggctaccaa 


1800 


tctga 




1845 



<213> OrganisitiNarae : Escherichia 
<400> PreSequenceString : 
atgaaaaaca aattgttatt tatgatgtta 
gcagcaggtt atgatttagc taattcagaa 
tcttcattta atcaggcagc cataattggt 
cggcagggag gctcaaaact tttggcggtt 
aagattgacc agacaggaga ttataacctt 
gatgccagta tttcgcaagg tgcttatggt 
ggtaataaag caaatattac acagtatggt 
cagtcgcaaa tggctattcg cgtgacacaa 
<212> Type : DNA 
<211> Length : 456 

SequenceName : SEQ ID 429 
SequenceDescription : 



COli 0157:H7 

acaatactgg gtgcgcctgg 
tataacttcg cggtaaatga 
caagctggga ctaataatag 
gttgcgcaag aaggtagtag 
gcatatattg atcaggcggg 
aatactgcga tgattatcca 
actcaaaaaa cggcaattgt 
cgttaa 



gattgcagcc 
attgagtaag 
tgctcagtta 
caaccgggca 
cagtgccaat 
gaaaggttct 
agtgcagaga 



60 
120 
180 
240 
300 
360 
420 
456 



Sequence 

50 <213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgaacattt ttgcatattt actggtactt gtattttcca tgagcatgag cagcagcgcg 60 

tttgccagcg tggtaatgac cggaacccgt attattttcc ctggtgacgc aaaggaaaaa 12 0 

accatccagt tgcgaaatac cagcgatcag ccctatatca ttaatatcca tgttgaggat 180 

55 gaacgtggtt ctgacaagaa tgtaccgttt atgccaaccc cgcagacatt tcgcatggaa 240 

gctgccgcag gtcaggcgtt acgcctgctc tacactggta ataatttacc gcaggatcgc 3 00 

gagtctgttt tctggtttag tttcagtcaa ctaccttatc tgaataagaa tgataaaagt . 3 60 

cagaaccagc tcatcctggc cctgactaat cgagtcaaaa ttttctatcg tcccagctcg 420 

attgtcggta aatccagtga cgcacccaaa aacctgactt accaggtaaa acagaaccgc 480 

60 attgaagtga cgaatcccac gggctattac gtcacaattc gcgccgctga actgcttaat 540 

aatggtaaaa aagtccccct cgcgaattcg gtaatgattg ctcctcaaag cacaactgaa 600 

tggacactac cctctggcat cagtgtcgct cccggtgcgc agatccattt agtgaccgtc 660 

aacgactatg gcgtaaatgt tacgtctgag catgccttat aa 702 
<212> Type : DNA 

65 <211> Length : 702 

SequenceName : SEQ ID 430 
SequenceDescription : 



